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Overview and Credits 

Introduction to Project 

Goal and intended audience 

Rediscovering Biology was designed for high school biology teachers 
who have substantial knowledge of basic biology but who want to 
learn about important new discoveries of the last two decades. It was 
also designed so teachers could familiarize themselves with research 
methods and tools that will lead to new discoveries in the coming 
decade. In designing the project we asked: What do teachers already 
know? What information would help them better understand recent 
and future developments? 

This is not a curriculum development project and does not attempt to 
provide materials for use in the high school classroom. In most cases, 
the level of presentation is too advanced for those who are beginning 
the study of biology. But through exposure to the research methods 
and techniques used by today's scientists, and with an understanding 
of some important new concepts, we hope that teachers will gain a 
heightened appreciation of ideas they already teach, as well as an 
increased ability to incorporate new topics into their curriculum. 

Other users — such as college students, advanced high school students, 
professional scientists, graduate students from other fields, or well- 
educated laypersons — may also find this project useful. We welcome 
their use of these materials. 

The materials were designed to be used in various ways. Some 
individuals may want to learn about a single topic and study parts of 
one unit on their own. Some may join in small facilitator-led groups, 
such as professional development in-service sessions, to go over one or 
a group of related units. Others may choose to complete the entire 
course. For the latter group, graduate credit may be earned through 
Colorado State University. For information on earning credit or 
obtaining materials go to: 

http://www.learner.org/channel/workshops/graduate_credit.html. 

How topics were chosen 

The teachers and researchers on our advisory board each proposed ten 
to twenty areas of biology that they thought had undergone 
significant change in the preceding decade. The cumulative list was 
then combined and narrowed down to thirteen major unit topics that 
the group agreed would provide a good foundation for those wanting 
to learn about new developments in biology. 



Rediscovering 

BIOLOGY 

Molecular to Global 

Perspectives 



Rediscovering Biology 



This is not a comprehensive treatment of the field of biology. It 
includes areas of study that may be entirely new to some, such as 
genomics and proteomics. It also includes more traditional topics, such 
as human evolution and neurobiology, which have changed 
substantially because of the application of new techniques. Indeed, a 
common theme throughout the project is the application of processes 
and techniques at the molecular level to enlighten studies of 
organisms, populations, or ecosystems. 

Assumptions about user knowledge 

We assume that users of this material have knowledge equivalent to 
that of someone with a bachelor's degree in a biological science. Most 
terms and concepts that are used in a high school biology text are not 
defined or explained. We also recognize that biology is a rapidly 
advancing field, and someone who graduated from college a decade 
ago could not have been exposed to some of what is taught today. 
Biology is also a huge field of study, and many students who graduate 
from college this year, even, will not have been exposed to all of the 
material collected here. Many of the concepts we explore can be found 
in a recent introductory college biology textbook such as Freeman's 
Biological Science, or Campbell and Reece's Biology. Users might find it 
useful to have access to such a text as an additional reference. 

Project Components 

Rediscovering Biology is a multimedia project. Each of the thirteen 
units comprises a half-hour video, an online text chapter, and a set of 
learning activities. The Web site provides access to all of these, as well 
as additional resources, including: 

• a glossary that serves as a navigational tool 
to other parts of the project 

• interactive case studies 

• transcripts from expert scientist interviews 

• animations from the videos and case studies 

• still images from the videos and text book 

The videos and the text chapters can be used independently; if both 
are used, it is possible to start with either one. We imagine that most 
users will watch the video first, then read the chapter, and then 
perhaps watch the video again. 

Each video includes interviews with two or more expert scientists. 
Through these interviews viewers will get a sense of how and why 
these scientists do their research, and will have a look at some of the 
equipment and techniques they use. In choosing experts to interview, 
we looked for those who are nationally and internationally 
recognized, regardless of their gender or ethnicity. Should you wish to 
know more about the work of a particular researcher featured in the 
videos, the full transcripts from the interviews with these experts are 
available on the Web site. 

The Web site is both to organize the different components of the 
project and a place to go for additional information. On the Web, a 
comprehensive glossary defines important terms used throughout the 
series, and provides links to text and animations where these concepts 
are explained or used. Animations from the videos are available on the 
Web site so that users may study them in more detail, playing them 
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repeatedly or pausing in the middle to study them. Transcripts of 
interviews of scientists provide a rare opportunity to get to know 
scientists who are associated with many of the leading discoveries in 
biology today and understand their research. 

Order of units 

Users may decide to study all thirteen units or they may be interested 
in a single one. Each unit is meant to stand alone, but we often refer 
to ideas and techniques presented in other units. We have organized 
the units so that techniques such as microarray analysis or BLAST 
searches, which are used in several units, are explained early in the 
series. An html form of the text is available on the Web; from it users 
may navigate through the various units and the different components. 

Online text 

The online text chapters are not simply a repeat of what is in the 
video. Rather, they show how information from the video fits into the 
larger field. In other words, they provide context for the focused 
examples presented in the video. One central theme present in nearly 
all of the chapters of the online text is the role that genetics and 
genomic studies have had in increasing our understanding of the 
various fields of biology. 

Each chapter was written by one of three authors, selected for his or 
her knowledge of biology and ability to write clearly about that 
knowledge. All of these authors have taught at the college level. The 
chapters vary somewhat in style and level of difficulty; these 
differences result both from the nature of the material itself, as well as 
from differences among writers. 

Authors 

Amy Does, PhD, is a microbiology instructor at Portland Community 
College in Portland, Oregon. In addition to teaching prenursing 
students, she provides professional development for elementary school 
teachers who conduct afterschool science clubs. She has developed 
exhibits for a science museum, designed science software for middle 
school students, and taught college-level biology online. Amy is the 
author of the Microbial Diversity, Emerging Infectious Diseases, HIV 
and AIDS, and Genetically Modified Organisms chapters. 

Norman A. Johnson, PhD, is an adjunct research assistant professor 
at the University of Massachusetts at Amherst. His research has focused 
on speciation and several other areas of evolutionary genetics. In 
addition to the University of Massachusetts, Norman has also taught at 
the University of Chicago and the University of Texas at Arlington. 
Norman served as the style editor for all thirteen chapters, and is the 
author of the Evolution and Phylogenetics, Genetics of Development, 
Human Evolution, Neurobiology, and Biodiversity chapters. 

Teresa Thiel, PhD, is a professor of biology at the University of 
Missouri-St. Louis. Her main interests are molecular biology, 
microbiology, and bioinformatics. She directs a program for high school 
teachers and students called "Science in the Real World: Microbes in 
Action" that includes a Web site of the same name. She teaches 
microbiology and microbial genetics to undergraduate and graduate 
students, and offers summer workshops in microbiology for teachers. 
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Teresa is the author of the Genomics, Proteins and Proteomics, Cell 
Biology and Cancer, and Biology of Sex and Gender chapters. 

Learning activities 

Each unit contains several learning activities tailored to the 
information in the unit. These activities include simple review and 
discussion questions; exercises that demonstrate how data are 
generated, interpreted, and applied; explorations of ethical issues; and 
consideration of how the information relates to other fields. Most of 
the activities assume the participants are familiar with the unit's video 
and online text. 

Case studies 

Four interactive Web-based case studies showcase a specific area of 
applied or basic research in cancer, comparative evolution, HIV, or 
genetically modified organisms. Each case study takes the participant 
through a series of steps in a research project. After viewing 
explanatory and background material on the project, the participant 
chooses an experiment to perform or a hypothesis to test. The case 
studies provide an interactive experience that complements the video 
and text chapters; and provide a window into the choices, challenges, 
compromises, and rewards associated with one area of biological 
inquiry. Each case study is an independent activity but may incorporate 
information from more than one unit. 

Because the case studies go into greater depth than the videos and 
texts, and rely on information from them, it is best to do them after 
completing the other components. The first Web page of each case 
study provides links to the videos and online texts that are relevant to 
the study. 

Writers 

Chris Tachibana, PhD, has taught undergraduate biology since 1992 
at Salt Lake Community College, Penn State University, and the 
University of Washington. She is a research scientist at the University of 
Washington Biochemistry Department and the Carlsberg Research Labs 
in Denmark. Chris developed two case studies: The Genetics of 
Resistance to HIV and Designing an Anti-Cancer Drug. She also 
authored the learning activities for the Genomics, Proteins and 
Proteomics, Emerging Infectious Diseases, HIV and AIDS, Cell Biology 
and Cancer, Biology of Sex and Gender, and Genetically Modified 
Organisms units. In addition, she produced the learning activity course 
guides for all thirteen units, and gave the learning activites for all units 
a common voice. 

Andrea (Andi) White, PhD, is a postdoctoral research associate at the 
University of California, Berkeley. As a graduate student at the 
University of New Hampshire she was a teaching assistant for marine 
ecology, honors biology, economic botany, and a lab coordinator for 
plant biology. Her current research interests focus on algal stress 
physiology and biochemistry, and the generation of environmentally 
friendly, alternative fuel sources from green algae. Andi developed 
two case studies: Evolution of Tungara Frog Mating Calls and Plant 
Genetic Modification. She also authored learning activities for the 
Evolution and Phylogenetics, Microbial Diversity, Genetics of 
Development, Human Evolution, Neurobiology, and Biodiversity units. 
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Norman A, Johnson, PhD, (see biography under online author) also 
contributed to the learning activities for the Evolution and 
Phylogenetics, Microbial Diversity, Genetics of Development, Human 
Evolution, Neurobiology, and Biodiversity units. 

Project Team 

Advisors 

In addition to determining the content of the units, our advisors and 
consultants have been actively involved in reviewing the material for 
all thirteen units throughout its development. Videos, animations, case 
studies, and text chapters have all been reviewed several times during 
their production for accuracy and to ensure that these materials are as 
useful as possible to the intended audience. 

Our primary advisors and consultants consisted of a team of eight 
scientists involved in teaching, curriculum development, and research. 

Mark Bloom, PhD, is a science educator at Biological Sciences 
Curriculum Study (BSCS). He has developed print and Web-based 
curriculum materials for students in middle school, high school, and 
college. Previously, he was the assistant director of the Dolan DNA 
Learning Center, where he ran workshop programs for high school and 
college teachers. He developed the first educational kits using the 
polymerase chain reaction and coauthored the college lab manual 
Laboratory DNA Science. Mark was lead advisor for the Genomics, 
Proteins and Proteomics, Cell Biology and Cancer, and Biology of Sex 
and Gender units. 

Steve Boyarsky is the coordinator of curriculum improvement at Staff 
Development at Southern Oregon Education Service District. Steve 
coordinates professional development in a three-county region in 
southern Oregon. He taught high school biology and human 
anatomy/physiology for 18 years. Steve has been involved with state 
and national level biology education through the National Science 
Teachers Association, a congressional fellowship, grants, and 
curriculum projects. Steve commented on appropriateness of content, 
level, and style of all project components. 

Alan Dickman, PhD, is the biology curriculum director and an 
associate professor of biology at the University of Oregon. He has 
organized summer outreach programs in science for middle school, 
high school, and community college teachers, and has been involved in 
nationally funded programs to improve college-level biology 
education. Alan teaches introductory biology courses and an upper- 
division forest biology course. As lead scholar, Alan was responsible for 
final scholarly quality of all content of all project components. 

Marion Field Fass, ScD, is an associate professor of biology at Beloit 
College. She has been involved in curriculum reform efforts in biology 
through the BioQUEST Curriculum Consortium and the SENCER (Science 
Education for New Civic Engagements and Responsibilities) project of 
AAC&U. In 2002 she traveled to Kenya and Tanzania to work with 
professors who were developing undergraduate courses about the 
epidemic of HIV/AIDS and about its impact in their communities. Marion 
was lead advisor for the Microbial Diversity, Emerging Infectious 
Diseases, HIV and AIDS, and Genetically Modified Organisms units. 
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Paula Henderson has taught biology at Newark High School in 
Newark, Delaware since 1980, and received the Outstanding Biology 
Teacher award for Delaware in 1993. She has taught a course in human 
heredity and development at the University of Delaware, and is a 
coauthor of the NIH/BSCS module "The Brain: Understanding 
Neurobiology Through the Study of Addiction." Paula commented on 
appropriateness of content, level, and style of all project components. 

Patrick Phillips, PhD, is an associate professor of biology and a 
member of the Center for Ecology and Evolutionary Biology at the 
University of Oregon. His research focuses on theoretical and empirical 
studies of evolutionary genetics. He teaches foundations of biology, 
evolution, population genetics, and experimental design; and is the 
creator of the evolutionary biology Web site, EvoNet.org. Patrick was 
lead advisor for the Evolution and Phylogenetics, Genetics of 
Development, Human Evolution, Neurobiology, and Biodiversity units. 

John Postlethwait, PhD, is a professor of biology in the Institute of 
Neuroscience at the University of Oregon. His research interest is in 
developmental genetics; he and his group have discovered a genome 
duplication event that occurred before the vast radiation of teleost 
fish, which account for half of all species of vertebrates. His lab is 
currently investigating the genetic mechanisms that may help account 
for that explosion of biodiversity. The author of two non-majors 
textbooks for college students, John is committed to undergraduate 
education and has taught introductory biology to mostly non-biology 
majors since 1964. John provided critical assistance for the Genetics of 
Development unit and parts of several other units. 

Carol Wheeler is a biology teacher and department chair at Pine 
Creek High School in Colorado Springs, Colorado. She worked in 
medical research and was a certified histocompatibility technologist 
prior to teaching. She received a Christa McAuliffe grant to develop a 
molecular biology course, and an Intel grant designed to help get 
students eligible to compete in science fairs at the international level. 
Carol commented on appropriateness of content, level, and style of all 
project components. 

Evaluation 

In addition to the guidance from our team of advisors and consultants, 
an independent formative evaluation of three of the thirteen units 
was conducted by RMC Research Corporation. RMC Research staff 
selected ten biology teachers and ten professional development 
providers, who varied with respect to geographic location, race and 
ethnicity, and background knowledge in biology. These reviewers 
provided helpful input on these three units while they were being 
developed; suggestions made on these units were generalized, where 
appropriate, to the other ten units. 

Funder 

Rediscovering Biology is funded by Annenberg/CPB, a partnership 
between the Annenberg Foundation and the Corporation for Public 
Broadcasting (CPB), which uses media and telecommunications to 
advance excellent teaching in American schools. Annenberg/CPB videos 
help teachers increase their expertise in their fields and improve their 
teaching methods. For information on obtaining Annenberg/CPB 
materials, go to www.learner.org or call 1-800-LEARNER. 
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Producer 

Oregon Public Broadcasting (OPB) is a highly experienced producer of 
educational content with expertise in both traditional and new media 
approaches to formal education, community outreach, and television 
production. 

OPB has produced many series for Annenberg/CPB, including THE 
UNSEEN UNIVERSE: An Introduction to Microbiology; A WORLD OF 
ART: Works in Progress, a series on contemporary artists; AMERICAN 
PASSAGES: A Literary Survey, a multimedia series for college students; 
and ARTIFACTS & FICTION, a professional development workshop series 
for teachers on interdisciplinary approaches to American literature. 
OPB has also been the co-producer for video series and digital 
materials to accompany several McGraw-Hill textbook publications. 

OPB has a long history of producing Web sites, teachers' guides, and 
other curriculum materials to accompany educational and PBS 
broadcast series. Working in close concert with national advisory 
boards, OPB's staff has produced curriculum materials in the 
humanities and sciences for a variety of grade levels and teacher 
professional development. 

OPB is also a major producer of PBS Primetime documentary series, and 
has created programming for NOVA, FRONTLINE, and other programs 
as well as numerous specials and limited series. 

Research Staff 

Rediscovering Biology would not be possible without the hard work of 
the research and production staff at Oregon Public Broadcasting. The 
research staff provided critical support for video producers, authors, 
and activity developers. 

Cindy Lefton has a bachelor's degree in zoology and a master's 
degree in mass communication with an emphasis on science writing 
and editing. She has served as the editor of a medical news magazine, 
and has edited several medical textbooks and journal articles. Her 
interests in science and nature have lead to volunteer service as an 
education coordinator for a wildlife rehabilitation facility, a zoo guide, 
and a science fair coordinator. 

Liza Nicoll earned a bachelor's degree in biology and a bachelor's 
degree in health science at the University of Massachusetts at Amherst 
in the spring of 2001. Since completing work on Rediscovering Biology 
she has continued to work in television production, researching for a 
world history documentary series. 

Stephani Sutherland earned a doctorate in neuroscience from the 
Vollum Institute at Oregon Health & Science University, where she 
coordinated an outreach program in public junior high and high 
schools called Kids Interested in Discovering Science (KIDS). Since 
leaving the research laboratory in 2001, she has worked as a science 
news reporter for the Los Angeles Times and traveled around the 
world. She now works for the Journal of Neuroscience and writes 
freelance science news for various journals. Stehpani was also a 
coauthor for the Neurobiology chapter of the online text. 
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Interviewees 

We are grateful to so many of these people who were willing to find 
time for this project. The following people provided valuable 
information to the project through interviews. 

Genetically Modified Organisms 

Leon Corzine; David L. Dornbos, Jr., PhD; Rebecca J. Goldburg, PhD; 
Marion Nestle, PhD, MPH; Thomas E. Newberry; and Gary H. 
Toenniessen, PhD. 

Emerging Infectious Diseases 

Capt. Daniel Carucci, MD, PhD; Rita Colwell, PhD; Laurie Garrett; 
Stuart B. Levy, MD; Judith M. Martin, MD; and Lukas K. Tamm, PhD. 

Cell Biology Cancer 

Elizabeth Blackburn, PhD; Brian Druker, MD; Leland Hartwell, PhD; 
Mary-Claire King, PhD; and Robert Weinberg, PhD. 

Biology of Sex and Gender 

Holly Ingraham, PhD; David Page, MD; and Eric Vilain, MD, PhD. 

Genomics 

David Altshuler, MD, PhD; James Carrington, PhD; Jonathan Eisen, PhD; 
and Eric Lander, PhD. 

Proteins and Proteomics 

Hamid Bolouri, PhD; Ned David, PhD; Stanley Fields, PhD; Hunter 
Fraser; Aaron Hirsh; and Leroy Hood, PhD. 

Microbial Diversity 

Anne Camper, PhD; Bill Costerton, PhD; Dan Kotansky, PhD; Anna- 
Louise Reysenbach, PhD; Frank F. Roberto, PhD; Phil Stewart, PhD; 
and Paul Sturman. 

HIV and AIDS 

Edward Berger, PhD; Laurie Garrett; Jay Levy, MD; 

Rob Roy MacGregor, MD; Erik Vonmuller; and David Weiner, PhD. 

Evolution and Phylogenetics 

Phillip Gingerich, PhD; Timothy Read, PhD; and Carl Woese, PhD. 

Human Evolution 

Kari Stefansson, MD; Ian Tattersall, PhD; Ajit Varki, MD; 
and Christopher Wills, PhD. 

Neurobiology 

Wolfhard Aimers, PhD; Fred Gage, PhD; Richard Huganir, PhD; 
and John Williams, PhD. 

Biodiversity 

James Miller, PhD; Richard Ostfeld, PhD; Peter H. Raven, PhD; 
Eleanor Sterling, PhD; and G. David Tilman, PhD. 

Genetics of Development 

Judith Eisen, PhD; Markus Grompe, MD; John Incardona, PhD; Nipam 
Patel, PhD; and John Postlethwait, PhD. 



REDISCOVERING BIOLOGY 



Overview and Credits 8 



Additional Acknowledgements 

We would like to thank the following people at Oregon Public 
Broadcasting in addition to the researchers who made this 
project possible. 

Executive Producer Meighan Maloney; Production Manager Doug Brazil; 
Production Media Manager Catherine Stimac; Production Assignment 
Manager Joshua Wolfe; Web Developer John Kin; Web Assistant Ryan 
Servatius; Assistant Production Manager Mary Hager; Database 
Administrator Heather Chambers; and Copyeditor Jennifer Ingraham. 

The Rediscovering Biology video series was produced by Oregon Public 
Broadcasting's Educational Media Production Department. The creative 
team consisted of the following: Executives in Charge of Production 
David Davis and Jack Galmiche; Executive Producer Meighan Maloney; 
Producer/Writers Melissa Gerr, Nadine Jesling, Amanda Lowthian, and 
Eric Slade; Writer Andrew Holtz; Series Host Lew Frederick; Academic 
Director Alan Dickman; Production Assignment Manager Joshua Wolfe; 
Production Manager Doug Brazil; Production Media Manager 
Catherine Stimac; Researchers Cindy Lefton, Liza Nicoll, and Stephani 
Sutherland; Director of Production Services Milt Ritter; Manager of 
Production Scheduling Bill Dubey; Director of Engineering Information 
Dave Fulton; Assistant Director Sean Hutchinson; Assistant Production 
Manager Mary Hager; Pre-Production Coordinator Thea Bergeron; 
Videographers Art Adams, Karel Bauer, S.O.C., David Dennison, Paul 
Jacobson, Lisa Suinn Kallem, Jim Langley, Michael McNamara, Corky 
Miller, Ben Nieves, John Patzer, Todd Sonflieth, Dave Spangler, and 
Wally Szczubialk; Editors Tom Babich, Bruce Barrow, Sarah Marcus, 
Chris Nolan, John Patzer, and Kate Schoninger; Field Audio Michael A. 
Bidese, Chad Birmingham, Darren Brower, Kevin Brown, Chris Callus, 
Francis X. Coakley, Tony D'Annunzio, Thorn Dentler, Jay Farrington, 
Dave Foreman, Thomas Forliti, Gerry Formicola, G. John Garrett, Joel 
Groeblinghoff, Cindy Hogan, Chip Lake, Randy Layton, Gordon 
Masters, Casey Quinlan, C.A.S, Todd Schmidt, Brandt Sennhenn, Mike 
Tyrey, Ted Ver Valen, Bill Ward, and Matt Yeasley; Creative Director Tim 
Bergmann; Production Artists Dora Papay, Corrina Reff, and Jefferson 
P. Vowell; 3-D Animations Hot Pepper Studios, Animation Dynamics, 
Inc., and Kevin Washington; Rights Assistant Morgan Currie; Theme 
Music Cal Scott; Production Intern Larry Johnson; Production Art 
Interns Soumalay Douangmala, Kim Harshberger, and Kevin Jaquette; 
Production Assistants Michael Aaris, David Banyan, Emily Chapman, 
Mike Forest, Kenyatta Gomez, Madeleine Pappas, Michelle Pridemore, 
Anastasia Savko, Alex Selkowitz, and Jonathan Zintel. 

The Rediscovering Biology Web Site was produced by the following 
creative team: Oregon Public Broadcasting Web Developer/Producer 
John Kin; Web Assistant Ryan Servatius; Database Administrator 
Heather Chambers; Project Coordination, Flash Interactive 
Development and Project Design AMAZING! Online Marketing, 
LLC in association with Moshofsky/Plant Creative Services and 
Bergmann Graphics. 
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Genomics 



'...the acquisition of the sequence is only the beginning. 
The sequence information provides a starting point from 
which the real research into the thousands of diseases 
that have a genetic basis can begin. " J. Craig Venter 1 
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The Human Genome Project 

In 1986 Nobel laureate Renato Dulbecco laid down the gauntlet to the 
scientific community to sequence the complete human genome. "Its 
significance," he said, "would be comparable to that of the effort that 
led to the conquest of space, and it should be carried out with the 
same spirit." 2 Dulbecco also argued that such a project should be "an 
international undertaking, because the sequence of the human DNA is 
the reality of the species, and everything that happens in the world 
depends upon those sequences." 

Like the conquest of space, sequencing the human genome required 
the development of wholly new technologies. The human genome, 
containing more than three billion nucleotides, is vast. In 1986 DNA 
sequencing had yet to be automated and, consequently, was slow and 
tedious. Moreover, computer software for sequence analysis was just 
being developed. Similar to the Apollo project that met President 
Kennedy's goal of a manned lunar landing by 1970, the genome 
project also succeeded — beyond the dreams of the scientists who 
proposed it. 

During the 1990s rapid progress was made in developing automated 
sequencing methods and improving computer hardware and software. 
By 2003 biologists had sequenced genomes from about one hundred 
different species. These species included dozens of bacteria and other 
microbes, as well as the model systems: yeast, fruit fly, nematode, and 
mouse. The capstone, of course, was the completion of the human 
genome sequence. In 2001 two rival teams jointly announced the 
completion of a draft sequence of the entire human genome, 
consisting of more than three billion nucleotides. 

Is human DNA "the reality of the species"? Do we now have all the 
information we need to define human life? Perhaps surprisingly, the 
answers are no. Genetics is more than just DNA. While DNA is the 
blueprint for life, proteins carry out most cellular functions; DNA just 
codes for RNA, which codes for protein. 

One major surprise emerged from the sequencing of the human 
genome. Although some scientists expected to find at least 100,000 
genes coding for proteins, only about 30,000-35,000 of such genes 



appear to be in the human genome. These genes comprise only about 
two percent of the entire DNA. What is the rest of the DNA doing? 
Biologists once thought that this noncoding DNA was just junk, and 
hence called it "junk DNA." As we will see below, evidence now 
suggests that some junk DNA may have functions. 

The quest to understand the workings of human cells will not be over 
until we understand how this genetic blueprint is used to produce a 
particular set of proteins — the proteome — for each type of cell and 
how these proteins control the physiology of the cell. (See the Proteins 
and Proteomics unit.) We should think of the human genome as a 
database of critical information that serves as a tool for exploring the 
workings of the cell and, ultimately, understanding how a complex 
living organism functions. 

Sequencing a Genome 

Sequencing a genome is an enormous task. It requires not only finding 
the nucleotide sequence of small pieces of the genome, but also 
ordering those small pieces together into the whole genome. A useful 
analogy is a puzzle, where you must first put together the pieces of a 
smaller puzzle and then assemble those pieces into a much larger 
picture. Two general strategies have been used in the sequencing of 
large genomes: clone-based sequencing and whole genome 
sequencing (Fig, 1). 

In clone-based sequencing (also known as hierarchical shotgun 
sequencing) the first step is mapping. One first constructs a map of the 
chromosomes, marking them at regular intervals of about 100 
kilobases (kb). Then, known segments of the marked chromosomes 
(which can contain very small fragments of DNA) are cloned in 
plasmids. One special type of plasmid used for genome sequencing is 
a BAC (bacterial artificial chromosome), which can contain DNA 
fragments of about 150 kb. The plasmid's fragments are then further 
broken into small, random, overlapping fragments of about 0.5 to 
1 .0 kb. Finally, automated sequencing machines determine the order 
of each nucleotide of the many small fragments. 

Data management and analysis are critical parts of the process, as 
these sequencing machines generate vast amounts of data. As the data 
are generated, computer programs align and join the sequences of 
thousands of small fragments. By repeating this process with the 
thousands of clones that span each chromosome, researchers can 
determine the sequences of all the larger clones. Once they know the 
order of all the larger clones, the researchers can join the clones and 
determine the sequence of each chromosome. 

Finding the sequence of the smaller clone fragments is relatively easy. 
The challenge is assembling all the pieces. The National Human 
Genome Research Institute (the public consortium headed by Francis 
Collins) used clone-based sequencing for the human genome. In doing 
so, they relied heavily on the work of computer scientists to assemble 
the final sequence. 

Whole genome shotgun sequencing skips the mapping step of 
clone-based sequencing. Instead, it (1) clones millions of the genome's 
small fragments in plasmids, (2) sequences all of these small 
overlapping fragments, and then (3) uses computers to find matches 
and join them together. 
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Strategies for cloning whole genomes. 
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Celera Genomics, a private company headed by J. Craig Venter, used 
this approach to clone the human genome. Although they started 
much later than the public consortium, Celera completed its draft 
sequence at about the same time as the consortium; however, it had 
the advantage of having access to all the consortium's maps. 

Genome sequencing projects now generally use some combination 
of chromosome mapping, and clone-based and whole genome 
shotgun sequencing of smaller fragments. The technology 
developed for sequencing the human genome — both in terms of 
sequencing DNA and in the software and hardware used to assemble 
the sequences into a genome — has resulted in the rapid sequencing 
of many other genomes. 
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Finding Genes 

Imagine the genome as an encyclopedia with a volume for each 
chromosome. If you were to open a volume, you would find page after 
page containing only four letters — A, T, G, and C — without spaces or 
punctuation. How could you read such a book, or even identify 
possible words and sentences? The genome sequence itself does not 
provide direct information on the location of a gene, but there are 
clues embedded in the sequence that computer programs can find. 

Most simple gene prediction programs use several pieces of sequence 
information to identify a potential gene in a DNA sequence. The 
programs look for sequences in the DNA that have the potential to 
encode a protein. These sequences are called open reading frames 
(ORFs). An ORF usually begins with a codon of UAG (Fig, 2), and then 
contains a long sequence of codons that specify the protein's amino 
acids. The ORF then ends with a stop codon of UAA, UAG, or UGA. Using 
overlapping frames of three nucleotides each, the computer program 
searches the database until it identifies an ORF region. For example, the 
sequence "abcdefghijk" could be read in three-letter "words" of "abc- 
def-ghi," "bcd-efg-hij," or "cde-fgh-ijk." Computer programs can scan 
DNA sequences quickly, using these overlapping reading frames on both 
the original strand and on the complementary strand, producing a total 
of six different reading frames for any sequence. 
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Figure2. To find an open reading 
frame (ORF), a computer program 
identifies start codons (red arrows) and 
stop codons (green lines) in all three 
reading frames (represented by the 
three stacked rows). The black box is 
the largest ORF found in this sequence. 
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Using these programs to find ORFs in bacterial genomes is relatively 
easy. Here, the DNA sequence matches the mRNA. The situation is 
more complicated for eukaryotic genes, which often contain one or 
more noncoding regions (introns). To find ORFs in these genes, the 
introns are removed in a process called splicing (Fig, 3). The final 
spliced mRNA, which encodes the protein product of the gene, is 
smaller than the original RNA transcript that matches the genome. The 
introns are removed, leading to the splicing of the coding regions of a 
gene (exons) together into the final mRNA. The problem is that a 
simple ORF-finding program cannot be used with genomic DNA that 
has introns because those genes do not match the mRNA. While 
computer programs can identify eukaryotic genes with introns, they 
are not always accurate. 
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Figure 3. A gene consists of coding 
regions, called exons, that are 
interrupted with intervening noncoding 
regions, called introns. During 
transcription, the whole segment of 
DNA that corresponds to a gene is 
copied to make RNA. During RNA 
processing, the introns are removed and 
the exons are joined. A poly(A) tail is 
added to the mRNA. 
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An alternate approach to characterize genes in eukaryotes is to first 
make a DNA copy of the mRNA encoded by the gene. To do this, one 
uses an enzyme called reverse transcriptase. The copy, called cDNA or 
complementary DNA, has the same sequence as the mRNA, except that 
the U is replaced by a T. Because the cDNA lacks introns, the sequence 
of the cloned cDNA can be used to find an ORF. In addition to simply 
identifying ORFs, many advanced sequence analysis programs use 
other information to help identify eukaryotic genes in the 
chromosome. (See the BLAST section below.) 
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Is the Eukaryotic Genome a 
Vast Junkyard? 

Bacteria have small, compact genomes, rich in genes. These genes have 
fewer noncoding regions and no introns. Eukaryotic genomes, however, 
often have much more DNA content than prokaryotic genomes. While 
eukaryotes generally have more genes than bacteria, the difference in 
gene content is not as great as the difference in DNA content: there is 
much more noncoding DNA in eukaryotes. In fact, gene-coding regions 
comprise only about two percent of the human genome. 

Most eukaryotic genes are interrupted by large introns. Even with 
these introns included, however, genes comprise only about twenty- 
five percent of the human genome. In eukaryotes, repeated sequences 
characterize great amounts of noncoding DNA. Some of this repetitive 
DNA is dispersed more or less randomly throughout the genome. There 
are also millions of copies of other, shorter repeats, but they are 
typically found in larger blocks. Some trinucleotide (3 bp) repeats are 
associated with diseases such as fragile X and Huntington's disease, 
which result from extra copies of the repeat sequence. 

Most of these repeat sequences are transposable elements, that can 
replicate and insert a copy in a new location in the genome. The result 
is the amplification of these repetitive elements over time. Transposable 
elements can be harmful because they can cause mutation when they 
move into a gene. They also use cellular resources for replication and 
expression. Are these elements unwelcome guests gone wild or may 
they actually be useful components of the genome? We don't really 
know, but there are some tantalizing suggestions of functions for some 
of these elements. About one million copies of the repetitive DNA 
element called Alu repeats lurk in the genomes of each one of us. What 
are they doing? One study found that these bind to proteins used to 
reshape chromatin during cell division. Perhaps this apparent junk DNA 
is actually helping provide structure to the chromosome and regulate 
the production of proteins in different cell types. 

Genomes differ in size, in part because they have different proportions 
of repetitive DNA. For example, the total genome size of the puffer 
fish is about one-tenth the size of the human genome. However, the 
puffer fish genome has about the same number of genes as the human 
genome, and the genes appear to have the same functions. The puffer 
fish genome is also smaller than the human genome. This is partially 
because it contains only about fifteen percent repetitive DNA, while 
more than half the human genome is repetitive DNA. Because most 
human genes are present in the puffer fish and the puffer fish genome 
is less cluttered by repetitive DNA, this model organism may help 
scientists identify the genes responsible for human diseases. 

The Difference May Lie Not in the 
Sequence but in the Expression 

Most genes are shared across all animals. More than ninety-nine 
percent of human genes have a related copy in the mouse. As one 
examines animals that are more distantly related, the proportion of 
the genes they share decreases; however, despite about 500 million 
years of evolutionary separation, half the genes in the lowly sea squirt 
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correspond to those found in humans. This remarkable conservation of 
gene structure is striking considering how much these animals differ in 
morphology, physiology, and behavior. 

If they share so many of the same genes, why are different animals so 
different? Differences among species result largely from differences in 
the time and location of the genes' expression. Let us consider our 
closest relative, the chimpanzee. Not only do chimpanzees and humans 
share nearly all of the same genes, but the DNA sequences of those 
genes also are very similar between the two species. Svante Paabo 
sequenced three million bases of the chimp genome and found that 
chimps and humans differ overall by less than two percent at the 
sequence level. (See the Human Evolution unit.) Based on the low 
sequence divergence, Paabo hypothesized that the difference between 
humans and chimpanzees was due mainly to how the genes were 
expressed in the different species. 

To test this hypothesis, Paabo compared the expression pattern of 
20,000 human genes in humans and chimps. He found that while 
expression levels were similar in liver cells and blood, there were larger 
differences in brain cells. This suggests that the human brain has 
increased the use of certain genes compared to those same genes in a 
chimp. So, it not so much the sequence of the genes that is important, 
but how they are expressed to make the cell's proteins that determines 
the unique characteristics of each organism. 

Determining Gene Function 
from Sequence Information 

Researchers have produced an enormous number of genome 
sequences from a variety of organisms. Publicly available databases, 
such as GenBank at the NCBI (National Center for Biotechnology 
Information), store many of these sequences. The databases have been 
a tremendous boon for comparative biology. The NCBI database stores 
not only the genome sequences, but also information about the 
function (if it is known) of the genes. 

The NCBI can also identify unknown genes by comparing them with 
known genes in the database. One program commonly used for this 
purpose is BLAST (Basic Local Alignment Search Tool). Sequence 
similarity searching algorithms like BLAST are based on the premise that 
if two sequences are similar then they are likely to be homologous 
(that is, they share a common evolutionary ancestor). (See the Evolution 
and Phylogenetics unit.) Using this database, one can infer the function 
of an unknown gene by finding similar sequences of known genes and 
proteins. For example, suppose you were to use BLAST to search for 
sequences similar to a new gene. Upon viewing your results, you 
noticed that all the sequences with a high degree of similarity to the 
new gene belonged to a family of genes known to break down 
hydrogen peroxide. You could logically conclude, then, that this new 
gene encoded a protein with a similar function. 

BLAST searches can be done at the nucleotide level; however, 
comparisons at the amino acid level provide much greater sensitivity. 
Therefore, unless one is particularly interested in the DNA sequence 
itself, it is better to search for genes using protein. If you have only 
raw nucleotide sequence data, computer programs can automatically 
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translate the DNA into amino acids using all six reading frames (three 
frames from one strand and three frames from the complementary 
strand) before searching the protein database. 

In addition to whole proteins, similarity searches can identify protein 
motifs. A motif is a distinctive pattern of amino acids, conserved 
across many proteins, which gives a particular function to the protein. 
For example, the presence of one particular motif in a protein indicates 
that this protein probably binds ATP and may therefore require ATP for 
its action. 

The result of a database search is a list of matches, ranked from 
highest to lowest, based on the probability of a significant match 
(Fig, 4). The reported alignment scores are given "expectation values" 
(E), which represent the probability that a match with the reported 
score would be expected to occur by random chance. The smaller the 
E-value, the higher the assigned score and the less likely that the 
match was coincidence. Some of the easiest results to interpret are very 
high scores (small E-values, low-probability), which usually result from 
two very similar proteins. Other easily identifiable results are very low 
scores, which indicate that the outcome is probably the result of 
chance similarity. 



Sequences producing significant alignments: 



Score E 

(bits) Value 



Figure 4. The results of a BLAST search 
using the delta chain of hemoglobin as 
the query. 



Q-i I 4504351 1 ref |»P_000510. 1 1 delta globin [Homo sapiens] >gi... 128 2e-29 

<3il70353lpiKl IHPHU hemoglobin delta chain - human 126 2e-29 

ci 1 1227141 so I P02043 IHBDPAHTR Hemoglobin delta chain >gi 134. . . 128 2e-29 

qj|70354|pirl IHDCZ hemoglobin delta chain - chimpanzee (ten... 127 2e-29 

qillS462iQ5lqblAAl72U7.lt delta-globin [Homo sapiens] 1£7 5e-29 

gi 1 229172 Ipk£ I I64Q488A hemoglobin delta 12S 2e~28 

qi 11225841 so I PQ20281HBB CERAE Hemoglobin beta chain >gi 1 703. . . 122 le-27 

qi 1 183655 lab 1AAA52635.1I hbbm fused globin protein ( beta c... 122 le-27 

qi 1 122668 1 3P I PQ2032 IHBBPREEN Hemoglobin beta chain >gi 1703. . . 122 2e-27 

qi 1 122616 1 3P I P02025 1 HBBHYLLA Hemoglobin beta chain >gi 1703. . . L2JL 2e-27 

qil223012lpi:£l I Q4Q4170B hemoglobin beta 121 2e-27 

qill22712lgplP19886IHBD C0LP0 Hemoglobin delta chain >gi 186. . . iZl 3e-27 

qil4929993lndbllCH4IA Chain A, Hodule-Substituted Chimera H. . . Hi 3e-27 

qi 1 122636 1 sp I P08259 1 HBBMANS? Hemoglobin be ca chain >gi 1703. . . 120 3e-27 

qi 1 13549112 1 ob I AAK29639. 1 1 AF349114 1 beta globin chain vari. . . 120 3e-27 

V > qi 1 122668 1 sp I F02032 [HBBPREEH Hemoglobin beta chain 
qj|70332|pir [ IHBMQP hemoglobin beta chain - hanuman langur 
Length =146 

Score ■ 122 bits (305), Expect ■ 2e-27 

Identities = 57/60 (95*), Positives = 57/60 (95V) 

Query: 1 VHLTPEEKTAVNALWGKVHVDAVGGEALGRLLVVYPWTQRrFESrGDLSSPDAVHGNPKV 60 

VHLTPEEK AV ALWKVNVD VGGEALC-RLLWYPUTOFlFFESFGDLSSPDAVHGWPKV 
Sbjct: 1 VHLTPEEKAAVTALWGKVNVDEVGGEALGRLLWYPirTQRFFESFGDLSSPDAVHGHPKV 60 



Search results also provide links (in blue) to a database page with 
information on each sequence similar to the query sequence. This page 
gives extensive information on the match sequence, including the 
organism it came from, the function of the gene product (if it is 
known), and references to journal articles concerning the sequence. 
BLAST results also provide the actual alignment results for nucleotides 
or amino acids between the query sequence and the match sequences. 
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The Virtues of Knockouts 

Gene prediction programs have been valuable in the preliminary 
identification of genes; however, they have limitations. Unless the 
gene of interest is homologous to a gene of known function, the 
function is generally still not known. A biological approach to 
determining the function of a gene is to create a mutation and then 
observe the effect of the mutation on the organism. This is called a 
knockout study. While it is not ethical to create knockout mutants in 
humans, many such mutants are already known, especially those that 
cause disease. One advantage of having a genome sequence is that it 
greatly facilitates the identification of genes in which mutations lead 
to a particular disease. 

The mouse, where one can make and characterize knockout mutants, 
is an excellent model system for studying genetic diseases of humans; 
its genome is remarkably similar to a human's. Nearly all human genes 
have homologs in mice, and large regions of the chromosomes are very 
well conserved between the two species. In fact, human chromosomes 
can be (figuratively) cut into about 150 pieces, mixed and matched, 
and then reassembled into the 21 chromosomes of a mouse. Thus, it is 
possible to create mutants in mice to determine the probable function 
of the same genes in humans. Genetic stocks of mutant mice have 
been developed and maintained since the 1940s. 

One goal of the mouse genome project is to make and characterize 
mutations in order to determine the function of every mouse gene. 
After a particular gene mutation has been linked to a particular 
disorder, the normal function of the gene may be determined. An 
example of this approach is the mutated gene that resulted in cleft 
palates in mice. The researchers found that the gene's normal function 
is to close the embryo's palate. An understanding of the genetics 
behind cleft palate in mice may one day be used to help prevent this 
common birth defect in humans. 

Genetic Variation Within Species 
and SNPs 

A polymorphism, the existence of two or more forms of sequence 
between different individuals of the same species, can arise from a 
change in a single nucleotide. These single nucleotide 
polymorphisms (SNPs) account for ninety percent of all 
polymorphisms in humans. The number of SNPs between two genomes 
provides a measure of sequence variation; however, the variation is not 
uniform over the genome. About two-thirds of SNPs are in noncoding 
DNA and tend to be concentrated in certain locations in the 
chromosome. In addition, sex chromsomes have a lower concentration 
of SNPs than autosomes. 

There are about three million SNPs in the human genome, or about 
1 per 1000 nucleotides. SNPs are ideal genetic markers for many 
applications because they are stable, widespread, and can often be 
linked to particular characteristics (phenotypes) of interest. They are 
proving to be among the most useful human markers for studies of 
evolutionary genetics and medicine. 
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Not all SNPs, even when they are present in coding genes, lead to 
visible or phenotypic differences among individuals. Changes in the 
DNA sequence don't always change the amino acid sequence of the 
protein. For example, a change from GGG to GGC results in no change 
in the protein because both codons result in a glycine in the protein. 
This is called a synonymous mutation or silent mutation; non- 
synonymous substitutions do cause a change in the amino acid. About 
half of all SNPs in genes are non-synonymous and therefore can 
account for diversity between individuals or populations. Depending 
on the particular change in an amino acid caused by a nonsynonymous 
mutation, the resulting protein may be an active, inactive, or partially 
active. It may also be active in a different way. 

One well-characterized SNP exists in a gene in chromosome 6. 
Individuals with cysteine at amino acid position 282 are healthy; 
however, about 1 in 200-400 Caucasians of Northern European descent 
possess two copies of that gene where the amino acid is tyrosine 
instead of cysteine. Due to this one change, these individuals have a 
disease called hereditary hemochromatosis. People afflicted with this 
disease accumulate high levels of iron, which causes permanent 
damage to the organs, especially the liver. About ten percent of these 
individuals carry only one copy of this mutation; they are heterozygous 
and are carriers of the disease. A genetic test for hereditary 
hemochromatosis is available, which can detect the SNP. If the disease 
is found, medical professionals can then determine whether the person 
is homozygous or heterozygous for this allele. Another example of a 
single SNP that has a dramatic effect is the one that leads to sickle cell 
anemia. (See the Human Evolution unit.) 



Identifying and Using SNPs 

In order to identify SNPs, nucleotide sequences of two or more 
genomic regions must be aligned so that the polymorphisms are 
apparent. Sequence alignments are easy when the sequences are 
similar, but can be very difficult when there are many polymorphisms. 
The alignment of two sequences is determined by a program that 
compares the two sequences, nucleotide by nucleotide. For multiple 
sequences, the program continues the same type of pairwise alignment 
for all possible pairs. The result is a pairwise distance matrix based on 
all possible alignments of any two sequences. This matrix is then used 
to construct a phylogenetic tree that predicts how closely related two 
sequences are, based on their similarity. The program then uses this 
information to align the sequences, again in order of their relatedness. 
This is the method used in a program called CLUSTAL. A typical output 
from CLUSTAL is shown in Figure 5. 



Figure 5. A CLUSTAL alignment of a 
segment of a gene from four species. 
The red letters show the amino acid 
sequence (R=arginine, P=proline, 
G=glycine, etc.). The nucleotides that 
are conserved in all four species are 
shown in the columns with an asterisk 
at the bottom. 
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Human CGGCCGCCGGGCAAGAGCGGCAAGTACTACTACCAGCTCAACAGCAAGAAGCACCAC 642 

Mouse CCCCCGCCAGG-AAGAGCGGCAAGTATTATTATCAGCTAAATAGCAAAAAGCACCAC 614 

Chicken CAGTCCCACAGCAAG GGCAAGTACTACTACCAGCTCAACAGCAAGAAGCACCAC 583 

Frog CATTCCAGTAACAAG AAAAAATACTATTATCAGCTCAATAGCAAAAAACATCAT 5DD 
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Most SNPs have no effect on an individual, so what use are having 
maps of them? SNPs appear to cluster in blocks called haplotypes. 
Grouping individuals that share a particular haplotype is called 
haplotyping. Because these particular sequences of SNPs on a 
chromosome are inherited together as blocks, they can be used to 
distinguish individuals and populations. What good is haplotyping? 
One can determine what specific diseases or other traits are associated 
with different haplotypes. In most cases, there are much fewer 
haplotypes than SNPs. Although it is the SNPs that actually cause 
disease, looking for changes in one SNP out of millions in the genome 
is not practical; looking for a particular haplotype is much easier. 

An example of the value of haplotype comes from research on Crohn's 
disease. Crohn's disease is a chronic inflammatory disease of the 
digestive tract that tends to cluster in families. Researchers identified a 
haplotype on chromosome 5 that correlates with the disease. This 
region of the chromosome contains genes involved in immunity; these 
genes then may be important in other inflammatory diseases, such as 
lupus or asthma. 

Practical Applications of Genomics 

Genome sequence data now provide tools for the development of 
practical uses for genetic information. DNA is an invaluable tool in 
forensics because — aside from identical twins — every individual has a 
uniquely different DNA sequence. Repeated DNA sequences in the 
human genome are sufficiently variable among individuals that they 
can be used in human identity testing. The FBI uses a set of thirteen 
short tandem repeat (STR) DNA sequences for the Combined DNA 
Index System (CODIS) database, which contains the DNA fingerprint 
or profile of convicted criminals. Investigators of a crime scene can use 
this information in an attempt to match the DNA profile of an 
unknown sample to a convicted crimina. DNA fingerprinting can also 
identify victims of crime or catastrophes, as well as many family 
relationships, such as paternity. While we think of forensics in terms of 
identifying people, it can also be used to match donors and recipients 
for organ transplants, identify species, establish pedigree, or even 
detect organisms in water or food. (See the Evolution and 
Phylogenetics unit.) 

An unusual application of DNA fingerprinting technology is a project 
of Mary-Claire King's at the University of Washington. (See the Cell 
Biology and Cancer unit.) Although her research is primarily concerned 
with the identification of genetic markers for breast cancer, she also 
has a project to help the "Abuelas," or grandmothers, in Argentina. In 
Buenos Aires in the 1970s and 1980s, children of activists 
"disappeared" during the military dictatorship. The children were 
placed in orphanages or illegally adopted when their parents were 
killed. Now King is using mitochondrial DNA, which is inherited only 
maternally, to reunite the children with their grandmothers. 

The basis of many diseases is the alteration of one or more genes. 
Testing for such diseases requires the examination of DNA from an 
individual for some change that is known to be associated with the 
disease. Sometimes the change is easy to detect, such as a large 
addition or deletion of DNA, or even a whole chromosome. Many 
changes are very small, such as those caused by SNPs. Other changes can 
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affect the regulation of a gene and result in too much or too little of 
the gene product. In most cases if a person inherits only one mutant 
copy of a gene from a parent, then the normal copy is dominant and 
the person does not have the disease; however, that person is a carrier 
and can pass the disease on to offspring. If two carriers produce a child 
and each passes the mutant allele to the child (a one-in-four probability), 
that individual will have the disease. 

Several different mutations in a gene often lead to a particular 
disease. Many diseases result from complex interactions of multiple 
gene mutations, with the added effect of environmental factors. 
Heart disease, type-2 diabetes and asthma are examples of such 
diseases. (See the Human Evolution unit.) Many diseases do not show 
simple patterns of inheritance. For example, the BRCA1 mutation is a 
dominant mutant allele that leads to an increased risk for breast and 
ovarian cancer. (See the Cell Biology and Cancer unit.) Although not 
everyone with the mutation develops the disease, the risk is much 
higher than for individuals without the mutation. 

Newborns commonly receive genetic testing. The tests detect genetic 
defects that can be treated to prevent death or disease in the future. 
Apparently normal adults may also be tested to determine whether 
they are carriers of alleles for cystic fibrosis, Tay-Sachs disease (a fatal 
disease resulting from the improper metabolism of fat), or sickle cell 
anemia. This can help them determine their risk of transmitting the 
disease to children. These tests as well as others (such as for Down's 
syndrome) are also available for prenatal diagnosis of diseases. As new 
genes are discovered that are associated with disease, they can be used 
for the early detection or diagnosis of diseases such as familial 
adenomatous polyposis (associated with colon cancer) or p53 tumor- 
suppressor gene (associated with aggressive cancers). The ultimate value 
of gene testing will come with the ability to predict more diseases, 
especially if such knowledge can lead to the disease's prevention. 

Gene therapy is a more ambitious endeavor: its goal is to treat or cure 
a disease by providing a normal copy of the individual's mutated gene. 
(See the Genetically Modified Organisms unit.) The first step in gene 
therapy is the introduction of the new gene into the cells of the 
individual. This must be done using a vector (a gene carrier molecule), 
which can be engineered in a test tube to contain the gene of interest. 
Viruses are the most common vectors because they are naturally able 
to invade the human host cells. These viral vectors are modified so that 
they can no longer cause a viral disease. 

Gene therapy using viral vectors does have a few drawbacks. Patients 
often experience negative side effects, and expression of the desired 
gene introduced by viral vectors is not always sufficiently effective. To 
counter these limitations, researchers are developing new methods for 
the introduction of genes. One novel idea is the development of a new 
artificial human chromosome that could carry large amounts of new 
genetic information. This artificial chromosome would eliminate the 
need for recombination of the introduced genes into an existing 
chromosome. Gene therapy is the long-term goal for the treatment of 
genetic diseases for which there is currently no treatment or cure. 
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Examining Gene Expression 

Understanding the functions of genes depends on knowing when and 
in what cells they are each expressed. How can one measure the 
amount of mRNA transcribed from a gene in a particular cell type? The 
standard method uses a probe — a DNA sequence unique for that 
gene — which binds to the mRNA that has the complementary 
sequence. The more mRNA particular cell produces, the more mRNA 
that is bound to the probe, giving the probe an increased signal. 
Because cDNA is complementary in sequence to mRNA, it can also be 
used to measure the expression of a particular gene. 

Organisms have so many genes in their genomes that studying the 
expression of all of these genes had been exceedingly difficult. Going 
from studying gene expression one gene at a time to examining 
expression patterns of a multitude of genes required new technology. 

In the late 1990s the development of microarray chips allowed 
researchers to examine the expression of thousands of genes 
simultaneously. This allowed for a much broader perspective of gene 
expression than was possible when genes were analyzed singularly. 
Microarray chips are glass slides spotted with many rows containing 
tiny amounts of probe DNA, one for each of thousands of genes 
(Fig. 6). The target sample of interest, usually made from mRNA of a 
specific type of cell, is labeled with a fluorescent dye and added to the 
chip. If there is a match between the sample of interest and the DNA 
probe on the chip, the two molecules will bind to each other. Then, 
when exposed to a laser, the spot will produce a signal that will 
fluoresce. (Figure 6 describes this process in more detail.) 

Scientists can use microarrays, a rapid and sensitive test, in a variety of 
experimental studies. Using microarrays, one can measure expression 
patterns of large numbers of genes in different cell types (such as 
cancer cells versus normal cells, or liver cells versus kidney cells). It can 
also be used to examine the changes in gene expression over time (for 
example, as an embryo develops), or changes in a given cell type under 
different environmental conditions (various temperatures, for instance). 
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Figure 6. 

A) RNA is isolated from cells 
from two samples (in this 
illustration, infected and 
uninfected plant cells). 

B) The mRNA from both 
samples is copied to a more 
stable form, called cDNA, 
using reverse transcriptase. 

C) At the same time, the 
cDNA is labeled with 
fluorescent tags (a different 
color tag for each sample). 

D) The tagged cDNA is placed 
on the microarray chip, where 
it binds to the corresponding 
DNA that makes up the genes 
that have been previously 
spotted on the chip. 





E) The chip is placed in a 
laser scanner, which 
identifies the genes that 
hybridize to each sample 
(uninfected=green; 
infected=red; and both 
samples=yellow). 

F) The data are displayed on 
a computer screen where 
expression of the individual 
genes can be identified. 



Photo-illustration — Bergmann Graphics 
3D Graphics — KStar Productions 
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Ethics 

Possessing detailed knowledge about the genetic makeup of 
individuals raises several complex ethical quandaries. How confidential 
should genetic information be? How should privacy concerns be 
weighed against other interests? If genetic information related to 
disease genes should be as confidential as any other health-related 
information, should there be databases of detailed genomic 
information on individuals? Even without detailed genomic databases, 
thirteen genetic markers are sufficient for the FBI to identify every 
person except identical twins. Should this type of genetic information 
be stored on all convicted criminals; everyone arrested for a crime; or 
on every individual, regardless of his or her past? Who should have 
access to detailed genetic information if it becomes available? Should 
it be accessible to law enforcement officers, physicians, research 
scientists, employers and potential employers, or insurance providers? 

Sir Alec Jeffreys, the scientist who first developed the technique of 
genetic fingerprinting in Great Britain, is a proponent of a DNA 
database that contains the genetic profile of every individual in that 
country. To provide anonymity, however, he suggests that the actual 
identity of each individual be kept in a separate database with high 
security. Only certain circumstances, such as a link to a crime, would 
justify identification of the individual. 

The NIH-DOE Working Group on the Ethical, Legal, and Social 
Implications (ELSI) has recommended that employers can request and 
use genetic information, but only to protect the health and safety of 
workers; such information must remain confidential. They also 
recommend that insurers cannot use genetic information to deny or 
limit health insurance coverage or to charge different fees based on 
this information. Overall, the focus of legislation should be to prevent 
discrimination of individuals based on genetic information. 

In 1993, long before the human genome was completed, a committee 
of the Institute of Medicine of the National Academy of Sciences 
developed recommendations to prevent involuntary genetic testing 
and protect confidentiality. They concluded that the responsible use of 
genetic testing requires that individuals understand the tests, their 
significance, and their implications. Testing for diseases should be done 
only when individuals are capable of providing informed consent. This 
means not only that individuals must be informed, but that they also 
should understand the implications of that consent. Such informed 
consent requires an understanding of genetics by the public. Education 
in genetics must be increased to ensure that future generations have 
this knowledge. 

Patenting of human genes is another ethical concern emerging from 
the human genome project. In order to be patentable under the U.S. 
Federal Patent Act, an invention must be "novel, nonobvious, and 
have utility." In applying for a patent on a human gene, applicants 
generally claim that the patent's holders will add to the utility of the 
natural gene by developing tests and therapies to fight diseases 
associated with that gene. Opponents of gene patenting think that 
patents will limit the ability of other scientists to do additional 
research on these genes. 
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Most patents are filed by private companies that plan to develop and 
market diagnostic tests and treatments that come from their research 
on a particular gene. These companies feel that, without a patent, they 
cannot afford to do the research that will lead to useful products. They 
argue they need the protection of a patent before they can invest 
millions of dollars in the development of new tests, drugs, and 
therapies. Some scientists counter that companies tend to patent genes 
even before they know what the gene does, so it is hard to understand 
how they can claim that they will increase the utility of such a gene. 
Making scientific data freely available, while still protecting the 
interests of private organizations that will provide the practical uses 
for the data, would be in the best interest of everyone. 

Epilogue 

The explosion of information coming from the sequencing of 
genomes has changed the landscape of biology. We now have tools 
to better understand the basis of disease and its prevention and 
control. These tools also allow us to design, more effective drugs, 
and even understand the genetic relationships among all living 
things that make the universal tree of life. Acquiring the sequence 
was only the beginning. 
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Glossary. 



BAC. Bacterial artificial 
chromosome. A plasmid vector 
used to clone large fragments 
of DNA (average size of 150 kb) 
in E. coli. 

BLAST. Basic local alignment 
search tool. A computer program 
that identifies homologous 
(similar) genes in different 
organisms. 

Clone-based sequencing. A 

genomic sequencing strategy that 
is based on a hierarchical 
approach. It uses mapping, 
cloning of large DNA fragments, 
and small DNA fragments in 
plasmids to organize the 
sequenced fragments of DNA into 
a single complete sequence. 

cDNA. Also known as 
complementary DNA. DNA 
produced by reverse transcribing 
mRNA. It has the same sequence 
as the mRNA (except that a U is 
replaced by a T). 

CLUSTAL. A computer program 
that aligns conserved regions in 
multiple DNA or protein 
sequences. Used to determine the 
evolutionary relationships among 
genes or proteins. 

DNA fingerprint (DNA profile). 

Nucleotide sequence variants that 
are characteristic of an individual 
and can be used as a unique 
identifier of that individual. 

Exon. The sequence of a gene 
that encodes a protein. Exons may 
be separated by introns. 

Haplotype. Particular patterns of 
SNPs on a chromosome that are 
inherited together as a block. 

Homologous (homology). 

Similarity of genes or other 
features of organisms due to 
shared ancestry. 



Intron. The DNA sequence within 
a gene that interrupts the protein- 
coding sequence of a gene. It is 
transcribed into RNA but it is 
removed before the RNA is 
translated into protein. 

Knockout study. Inactivation of 
a specific gene; typically used in 
laboratory organisms to help to 
determine gene function. 

Microarray chip. Set of 

miniaturized biochemical 
reactions that occur in small spots 
on a microscope slide that may be 
used to test DNA fragments, 
antibodies, or proteins. 

Open reading frame (ORF). The 

DNA or RNA sequence between 
the start codon sequence and the 
stop codon sequence. 

Plasmid. A small, circular, self- 
replicating, extrachromosomal 
piece of DNA. Many artificially 
constructed plasmids are used as 
cloning vectors. 

Polymorphism. The presence of 
two or more variants of a genetic 
trait in a population. 

Protein motif. A pattern of 
amino acids that is conserved 
across many proteins and c 
onfers a particular function 
on the protein. 

Short tandem repeat (STR). 

Multiple adjacent copies of an 
identical DNA sequence in a 
particular region of a 
chromosome. 

Single nucleotide 
polymorphism (SNP). Variations 
in the DNA sequence that occur 
when a single nucleotide (A, T, C, 
or G) in the genome sequence is 
changed. 



Synonymous mutation (silent 
mutation). A change in a 
nucleotide in the DNA sequence 
that does not result in a change in 
the amino acid in the protein. 

Transposable element. A type 
of DNA that can move from one 
chromosomal location to another. 

Whole genome shotgun 
sequencing. A genomic 
sequencing strategy that is 
based on cloning and sequencing 
millions of very small fragments 
of DNA, and then using 
computer programs to align the 
sequences together. 
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Proteins and 

Proteomics 

"If DNA is the genetic blueprint then what is the proteome? 
What are the proteins of the cell? The proteins of the 
cell are the walls, the floor, the plumbing, the beds, the 
furniture, the sinks, the glasses — everything that goes 
on in the house. All of those processes are being carried 
out by proteins and so DNA may be providing the 
instructions but all the work is really being done by 
the proteins." Stanley Fields, PhD 
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What Is Proteomics? 

A bacterial cell may seem simple but it's actually a complex structure — 
a gel-like matrix of the cytoplasm, surrounded by both a lipid bilayer 
cell membrane and a cell wall. The cell must perform many functions 
including the intake of nutrients, the metabolism of those nutrients, 
growth, cell division, and the excretion of wastes. What molecules are 
involved? Although the cytoplasm contains water, proteins, 
carbohydrates, various ions and assorted other molecules, proteins do 
most of the work. A typical bacterium requires more than 4,000 
proteins for growth and reproduction. Not all of the proteins are made 
at the same time and some are made only under special conditions, 
such as when the cell is stressed or finds itself in a novel environment. 
The complement of proteins found in this single cell in a particular 
environment is the proteome. Proteomics is the study of the 
composition, structure, function, and interactions of the proteins 
directing the activities of each living cell. 

If a bacterial cell needs more than 4,000 proteins, how many can we 
expect to find in animals? Mammals, including humans, have probably 
more than 100,000 proteins. Although the genome contains the 
genetic blueprint for an organism, the proteins of eukaryotes provide 
the unique structure and function that defines a particular cell or a 
tissue type, and ultimately defines an organism. Different types of 
cells make different proteins, so the proteome of one cell will be 
different from the proteome of another. In addition, cells that result 
from a disease, such as cancer, have a different proteome than normal 
cells. Therefore, understanding the normal proteome of a cell is 
critical in understanding the changes that occur as a result of disease. 
This knowledge can lead to an understanding of the molecular basis 
for the disease, which can then be used to develop treatment 
strategies. Knowing how the proteome changes as the organism 
grows may also provide insight into the mechanisms of development 
in healthy organisms. 



Rediscovering Biology 



Under the classical concept of "one gene makes one enzyme," the 
proteome would simply comprise the products of all the genes present 
in the genome of an organism. But it is not that simple. The number of 
genes identified in the human genome is only about 30,000-35,000. 
How can only 35,000 genes encode more than 100,000 proteins? There 
are several possible answers to this question, which will be discussed in 
more detail below. One answer is that each gene may encode several 
proteins in a process called alternative splicing. Alternative splicing 
means that one gene may make different mRNA products and, hence, 
different proteins. Another answer is that one protein may be 
modified chemically after it is synthesized so that it acquires a different 
function. A third answer is that proteins interact with each other in 
complex pathways and networks of pathways, which may change their 
function. So, one gene may produce several, functionally different 
proteins in a variety of ways. 

Introduction to Protein Structure 

Although proteins are unique, they share certain common 
characteristics (Fig, 1). The primary structure of each protein is 
determined by the sequence of specific amino acids, encoded by the 
mRNA, which directs the proper folding of the polypeptide chain into 
the secondary structure. One type of secondary structure is the alpha 
helix, a region of the polypeptide that folds into a corkscrew shape. 
Beta strands are linear structures of polypeptides, bonding together to 
form a flat beta sheet. Other regions of secondary structure may 
include turns and random coils. These helices, strands, turns, and coils 
interact chemically with each other to form the unique three- 
dimensional shape of the protein, called the tertiary structure. For 
some proteins, a single polypeptide chain folded in its proper three- 
dimensional structure creates the final protein. Many proteins, 
however, have several different polypeptide subunits that make the 
final active protein. For these proteins, the interactions between the 
different subunits form the quaternary structure. 

Discrete portions of proteins can fold independently from the rest of 
the protein and have their own function. These are called domains, 
and serve as one of the building blocks of that protein. Domains are 
evolutionary mobile, capable of rearranging as new proteins evolve. 
There are thousands of structural domains, and many of them have 
been conserved widely across proteins. New proteins appear to have 
arisen over evolutionary time by bringing different domains together 
in a process known as domain shuffling. Domains often contain 
smaller motifs, consisting of a conserved pattern of amino acids, or of 
combinations of structural elements formed by the folding of nearby 
amino acid sequences. An example of a motif is a helix-loop-helix, 
which binds to DNA. Very similar motifs are found in many proteins 
that are not related. Scientists have classified conserved domains and 
motifs in a number of databases so that new proteins can be easily 
analyzed for the presence of these elements. 

Determining Protein Structure 

While determining the polypeptide sequence resulting from gene 
translation is straightforward, determining the actual three-dimensional 
(3D) structure requires some sophisticated experimental techniques. 
One such long-standing technique is X-ray crystallography, which is 




Figure 1. Primary Structure: The 

specific sequence of amino acids in 
a polypeptide chain. Secondary 
Structure: The folding of the 
polypeptide chain into specific 
shapes, such as the alpha helix and 
beta pleated sheet. Other regions 
of secondary structure may include 
turns and random coils. Tertiary 
Structure: The unique three- 
dimensional shape that is the result 
of chemical interactions between 
amino acids that fold the regions of 
secondary structure. Quaternary 
Structure: The specific interaction of 
two or more polypeptide subunits. 
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based on the scattering of X-rays by the electrons in the crystalis 
atoms. Think of the regular structure of table salt crystals. The atoms 
forming that structure are spaced very precisely in the crystal. Due to 
this regular spacing, a particular diffraction pattern forms when X-rays 
strike it. One can reconstruct the position of each atom in the crystal 
by observing the diffraction pattern and, thus, can make a three- 
dimensional map of the molecule. Although proteins are much more 
complex than table salt, researchers have crystallized many of them in 
their native configuration and have used X-ray crystallography to find 
their 3D structures. The 3D structures of proteins are available to all 
scientists in a public database called the "Protein Data Bank." 

Not all proteins can be crystallized, however. For example, 
membrane proteins have many hydrophobic amino acids and are 
particularly difficult to crystallize. A different technique to analyze 
proteins in solution is nuclear magnetic resonance (NMR). NMR is 
based on the principle that the nuclei of some elements' atoms, such 
as hydrogen, resonate when a molecule, such as protein, is placed in 
a powerful magnetic field. NMR measures chemical shifts of the 
atoms' nuclei in the protein, which is dependent on nearby atoms 
and on their distances from each other. The signals that NMR 
produces are a set of distances between specific pairs of atoms. NMR 
data generate models of possible structures, rather than a single 
structure. For smaller proteins in particular, NMR can quite 
accurately predict the 3D structure. 

Despite advances in techniques for determining protein structure, the 
structures of many proteins are still unknown. With the help of protein 
prediction programs, computer analysis of genome sequences is 
producing thousands of new hypothetical proteins of unknown 
structure and function. These proteins are called "hypothetical 
proteins" because they represent the products predicted from the 
gene sequence; however, there is, as yet, no evidence that they are 
actually made and there is no known function for them. 

Computer programs may help determine the structure of proteins 
whose function is not yet known. By comparing the sequence of the 
unknown protein to proteins with known 3D structures, these 
programs can make a predictive model of the unknown protein's 
structure using the known proteins as templates. The success of this 
method depends on the quality of the match between the known 
template proteins and the unknown target protein. In addition, when 
the function of the template protein is known, it may help identify the 
function of the unknown protein. These prediction programs do not 
produce structures with the detail or reliability of experimental 
techniques such as X-ray crystallography. They do, however, provide a 
means to analyze — in a reasonable time period — the large number 
of new proteins identified by the analysis of whole genomes. 

Structure and Function Relationships 
of Proteins 

The three-dimensional structure of a protein defines not only its size 
and shape, but also its function. One characteristic that affects 
function is the hydrophobicity of a protein, which is determined by the 
primary and secondary structure. For example, let's look at membrane 
proteins. Membranes contain large amounts of lipids, which are 
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notoriously hydrophobic (water and oil donit mix). The membrane- 
spanning regions of membrane proteins are typically alpha helices, 
made of hydrophobic amino acids. These hydrophobic regions interact 
favorably with the hydrophobic lipids in the membrane, forming stable 
membrane structures. 

Hemoglobin is a soluble protein — found in the cytoplasm of red 
blood cells as single molecules — which bind oxygen and carry it to the 
tissues. In sickle cell anemia, a mutation in the beta-globin protein of 
the red blood cell increases its hydrophobicity and causes the mutant 
protein molecules to stick to each other, avoiding the aqueous 
environment. Chains of hemoglobin change the shape of the red blood 
cell from round to a sickle shape, which causes the cells to collect in 
narrow blood vessels. 

The folding of a protein allows for interactions between amino acids 
that may be distant from each other in the primary sequence of the 
protein. In enzymes, some of these amino acids form a site in the 
structure that catalyzes the enzymatic reaction. This site, called the 
active site of the enzyme, has amino acids that bind specifically to the 
substrate molecule, also called a ligand (Fig, 2). In a similar manner, 
certain sites in cell receptor proteins bind to specific ligand molecules 
that the receptor recognizes. 




Photo-illustration adaption — Bergmann Graphics 

Alterations in amino acids that may be distant from each other in the 
primary sequence can lead to changes in folding. It may also cause 
changes in chemical interactions among amino acids at the active site, 
which alter the enzyme activity or binding of the ligands to receptor 
proteins. Binding of ligands to an active site requires specific amino 
acids. Therefore, an active site in a new enzyme that belongs to the 
same family as a known enzyme can usually be identified by its 
similarity to the active site of the known protein. Computer programs 
can use the information from a database of known enzymes to predict 
the active site of a new protein using a template-based method, similar 



Figure 2. The active site of the 
penicillin-binding protein. The gray 
stick-like structures represent the 
secondary and tertiary structure of the 
penicillin-binding protein. Binding of 
the antibiotic, the substrate, to the 
active site blocks the normal action of 
the protein in the bacterial cell, 
resulting in death of the cell. 
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to that described above for determining the three-dimensional 
structure of a protein. Once the program has identified the potential 
ligand-binding sites, other programs can test the fit and the binding 
ability of thousands of possible ligand molecules — even theoretical 
ligands that may not yet exist. This has tremendous possibilities for the 
design of new drugs, particularly for cancer therapy. 

Protein Modification 

The complexities of the 3D structure of proteins are not the only 
difficulty in characterizing proteins. Many proteins contain additional 
chemicals that modify their structure. The final structure of a protein 
may include any number of modifications that occur during and after 
the synthesis of the protein on the ribosome. These post-translational 
modifications change the size and the structure of the final protein. 
Some modifications occur after a protein is made; others occur during 
translation of the protein, and are required for proper folding of the 
protein. One possible modification is enzymatic cleavage of the 
original polypeptide by proteases to produce a smaller product. Other 
modifications include the addition of sugar molecules to certain amino 
acids in the protein (glycosylation), or the addition of a phosphate 
group (phosphorylation) or a sulfate group (sulfation). 

Many proteins are modified by proteases that remove short peptides 
from either end of the protein. The shortened polypeptides then fold 
into an active protein. One of the most common of these cleavages is 
the removal of specific signal peptides. These peptides target proteins 
for transport to a particular cellular organelle in a process known as 
protein sorting. An example of this is the hormone insulin, which is 
made as preproinsulin. After removal of the 24-amino-acid signal 
peptide from preproinsulin to form proinsulin, the latter polypeptide is 
further processed in the endoplasmic reticulum. This produces the final 
hormone, insulin, which is released from the cell. 

Glycosylation — the addition of specific short-chain sugars to 
asparagine, serine, or threonine — is very common in membrane 
proteins that form structural components of the cell surface. These 
proteins, called glycoproteins, are important in many cell processes, 
including binding by receptors and eliciting an immune response. 
Glycoproteins are often specific cell markers. For example, ABO blood 
types result from the presence or absence of specific glycoproteins 
(A-type, B-type, both, or neither) on the surface of red blood cells. 
Human immunoglobulin G (IgG) is also a glycoprotein in which the 
sugar appears to very important for the normal function of the 
protein in the immune response. Scientists have discovered that 
abnormal sugars in IgG strongly correlate with the autoimmune 
disease called rheumatoid arthritis, characterized by chronic joint 
inflammation, and the presence of antibodies directed against IgG 
and other host proteins. 

Reversible phosphorylation of threonine, serine, or tyrosine residues 
by enzymes called kinases (which add a phosphate) and 
phosphatases (which remove the phosphate) play an important 
role in the regulation of many cell processes, such as growth and 
cell cycle control. (See the Cancer unit.) Phosphorylation may occur 
sequentially from one protein to another, resulting in a series of 
activations called a "phosphorylation cascade." 
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Genomics-Based Predictions of 
Cellular Proteins 

We now have large databases of gene sequences, predicted protein 
sequences, and known 3D protein structures; yet we still don't know 
the total protein composition of a cell. Determining the proteome of a 
cell is a complicated task. There are two approaches to obtaining this 
information: computer-based and experimental. 

The computer-based method uses the genome sequence of an 
organism to predict genes, based on known characteristics of protein- 
coding regions of the genome. (See the Genomics unit for a discussion 
of computer-based methods for gene identification and microarrays to 
identify expressed genes.) However, even if we know that a particular 
sequence is a gene, we don't necessarily know all the possible proteins 
it makes. 

One reason is that one gene may produce more than one mRNA. RNA 
splicing is the normal process in which intron sequences are removed 
from the pre-mRNA, producing the mRNA, which corresponds to the 
exons. However, some transcripts can be spliced in alternative ways 
(alternative splicing), joining different exons (Fig, 3). The result is two 
or more different mRNA molecules from one gene. Variants of a 
protein produced by alternative splicing may have a similar 
physiological activity, a different and unrelated activity, or no activity 
at all. According to one estimate, about forty percent of human genes 
are alternatively spliced. This is one mechanism that accounts for the 
relatively large number of proteins produced by only about 35,000 
human genes. 
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Figure 3. More than one protein can 
be made from a gene. In this case, 
three different mRNA molecules are 
made from one gene. The exons (the 
numbered boxes) can combine in 
different configurations to eventually 
form different proteins. 



A more direct approach to identify proteins in a cell is to measure 
enzyme activities and other functions for which there are biochemical 
assays. In some cases, we can identify the function of new proteins by 
combining our knowledge of metabolic pathways in many organisms 
with the predicted function from genome analysis. With this type of 
information, researchers can readily identify new enzymes. To do this, 
they examine the similarity of the genome sequences to known 
enzymes, as well as the presence (in the same genome) of the proteins 
that are required for the other steps in the metabolic pathway. 
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2D Gel Electrophoresis to Identify 
Cellular Proteins 

While computer-based methods are powerful, they can only predict 
the function of proteins for which some information is already 
available. How do we understand the proteins that we don't already 
know about? This requires experimental approaches. 

One way to identify proteins is to extract all the proteins from a 
sample of cells and separate them in a gel matrix, using a technique 
called polyacrylamide gel electrophoresis (PAGE). The proteins are 
separated by size, with the smaller proteins moving faster through the 
gel than the larger proteins. After staining, a pattern of bands appears 
that corresponds to the proteins in the cell. However, this technique 
can only resolve a few hundred proteins, and cannot separate proteins 
of very similar size. 

A modification of this procedure — called 2D gel electrophoresis — 

separates proteins into two dimensions, using two different 
characteristics. Proteins are separated in the first dimension by their 
isoelectric point (pi), the specific point at which the net charge of 
the protein is zero. These separated proteins, in a flat gel strip, are 
then placed on a standard polyacrylamide gel. Every protein band that 
was separated in the first dimension according to its isoelectric point is 
now separated in the second dimension by its size. The result is small 
spots, each representing a protein; even proteins of the same size will 
be resolved if they have a different isoelectric point. A good 2D gel can 
resolve one thousand to two thousand proteins, which appear, after 
staining, as dots in the gel (Fig, 4). This technique is useful when 
comparing two similar samples to find specific protein differences; for 
example, comparing the proteins in a tumor cell versus a normal cell. 
However, it can miss very small proteins or non-abundant proteins. 




kDa 



_68.5 



_44.4 



— 28.8 



• —16 



Figure 4, Haemophilus influenzae 
cell proteins separated by 2D gel 
electrophoresis. The basic proteins are to 
the right of the gel and the acidic 
proteins to the left. High molecular 
weight proteins are to the top of the gel. 
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Phil Cash, PhD, 2D GEL. 

Courtesy of Phil Cash, PhD, University of Aberdeen. 
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Mass Spectrometry to Identify 
Cellular Proteins 

While the 2D gel method easily separates proteins, it doesn't identify 
them. If there are differences in spots between the proteins in a cancer 
cell and a normal cell, this method cannot determine the actual 
identity of the different proteins in the two cell types. To identify these 
proteins, individual spots are excised from 2D gels and then subjected 
to mass spectrometry, which separates charged particles, or ions, 
according to mass. First the molecules in the sample are ionized to 
produce a population of charged molecules. A mass analyzer then 
separates the sample's molecules based on their mass to charge ratio. A 
detector then produces a peak for each ion; this peak gives the mass 
and represents the amount of the ion. A computer program reads the 
complex spectral information from the mass spectrometry process. The 
program matches the information on the each peptide's mass against 
the mass of theoretical, predicted peptides, based on known proteins 
in databases. This is called peptide mass mapping. With many 
different peptides for each protein, the computer can match the 
sequence to one or more known proteins. Peptide mass mapping can 
only be used in situations where the genome has been sequenced and 
all predicted proteins for the genome are known. 

Another application of mass spectrometry is protein fingerprinting. 

This technique has been used to identify unique sets of proteins in 
blood, which serve as markers for different forms of cancer. 
Interestingly, for this method to be useful, we do not need to know 
the actual identities of the particular proteins used as markers for a 
disease. Instead, this technique relies on pattern recognition software. 
Using training data from samples from individuals with and without 
cancer, the program searches for a particular pattern of peaks that 
correlates with cancer. This technique requires only a drop of blood 
and does not require any detailed genetic information; however, its 
accuracy in predicting some forms of cancer is limited because the 
number of marker peptides is not sufficiently large. As more samples 
are evaluated, the accuracy will likely increase because the software 
will be able to find more accurate peptide patterns correlating to 
cancer. Proteomic fingerprinting holds great promise as a diagnostic 
tool for a variety of diseases that produce distinctive patterns of 
proteins in blood. 

Identifying Protein Interactions 

While it is convenient to think of proteins as discrete and independent 
molecules, this is actually an oversimplified view. Many proteins 
require other proteins or cofactors for activity; and proteins involved in 
signal transduction, protein trafficking, cell cycle, and gene regulation 
must interact with other proteins in those processes. Many of these 
interactions require particular domains called interaction domains. 
Proteins involved in the interactions contain combinations of 
interaction domains (for interaction with other proteins) and catalytic 
domains (for function of the protein). The interaction domain can 
bind the partner protein, even in the absence of the rest of the 
protein. Interaction domains are often quite versatile, capable of 
binding a variety of related ligands. In addition, one protein may 
contain several different interaction domains. The modular nature of 
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these domains allows the protein to interact with multiple target 
proteins in the cell; thus it provides a mechanism for integration and 
control of information from protein to protein in a cell. Such protein- 
protein interactions form the basis for our current understanding of 
cell signaling pathways and protein networks that regulate all the 
activities in a cell. 

Because protein-protein interactions regulate the activities of cells, 
identifying them is critical to understanding cellular processes. 
Mass spectrometry techniques have been developed for large-scale 
screening to identify interacting proteins. For example, hundreds of 
known proteins in yeast were engineered to contain a biochemical 
tag that would allow the tagged protein to be separated from other 
proteins in a cell extract. This was done gently so that other proteins 
bound to the tagged protein would still be attached. The tagged 
protein, along with any associated proteins, was then analyzed by 
mass spectrometry. The results revealed that about eighty-five 
percent of these proteins were associated with other proteins. 
Although most interacted with many other proteins, in some cases 
two different protein complexes had at least one protein in common. 
Among the most intriguing questions to come out of this research 
were what controls which proteins interact and — for those that 
interact in multiple complexes — how do these proteins know which 
complex to join? 

The Yeast Two-Hybrid System 

The yeast two-hybrid system is a powerful technique for identifying 
multiprotein complexes. Using genetically engineered yeast, scientists 
can identify complexes when specific pairs of interacting proteins 
activate expression of a reporter gene. One often-used reporter gene is 
the lacZ gene. When two proteins interact in the yeast cell they 
activate expression of this gene, allowing yeast cells to metabolize an 
indicator that turns these cells a different color. The interacting 
proteins are then identified from the colonies formed by these colored 
cells. The two-hybrid system has been expanded to use microarrays of 
cloned yeast genes (see below). These large-scale yeast two-hybrid 
assays can provide information on thousands of protein-protein 
interactions. Using this technology, researchers are identifying all the 
proteins in yeast that interact, and they will then map the complex 
network of cellular functions to these interacting proteins. 

Protein Microarrays 

Another strategy for the large-scale study of proteins is similar to the 
DNA microarrays, which measure gene expression in different cells 
types. (See the Genomics unit.) Based on the rapid, large-scale 
technology (often called high-throughput technology) that was 
developed for DNA microarrays, scientists have developed similar 
microarrays for proteins. In a protein microarray, very small amounts of 
different purified proteins are placed on a glass slide in a pattern of 
columns and rows. These proteins must be pure, fairly concentrated, 
and folded in their active state. Various types of probe molecules may 
be added to the array and assayed for ability to bind or react with the 
protein. Typically the probe molecules are labeled with a fluorescent 
dye, so that when the probe binds to the protein it results in a 
fluorescent signal that can be read by a laser scanner. 
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This technology can complement other techniques, such as mass 
spectrometry and yeast two-hybrid assays, to identify thousands of 
protein-protein interactions. Protein arrays can be screened for their 
ability to bind other proteins in a complex, receptors, antibodies, lipids, 
enzymes, peptides, hormones, specific DNA sequences, or small 
molecules, such as potential new drugs. One of the most promising 
applications for protein microarrays is the rapid detection or diagnosis 
of disease by identifying a set of proteins associated with the disease. 

One example of the use of this technique is the development of a 
microarray that may help in the treatment of cancer. This microarray 
contains many different mutant forms of a protein called p53. P53 is an 
anti-cancer protein, called a "tumor-suppressor protein," and about 
half of all cancers have mutations in p53. (See the Cancer unit.) 
Researchers can screen the immobilized mutant p53 proteins in the 
microarray for biological activity, as well as for new drugs that can 
restore its normal tumor-suppressing function. 

Protein Networks 

The cell is a complex and dynamic system of networks of interacting 
molecules. An understanding of the cell requires analyzing these 
complex interactions as a system. Systems biology takes the approach 
that the powerful high-throughput techniques, developed as part of 
whole genome and proteome analysis, will allow the simultaneous 
study of complex interactions of networks of molecules, including 
DNA, RNA, and proteins. Fully understanding complex networks of 
molecular interactions in the cell requires a combination of several 
different experimental techniques, including DNA and protein 
microarrays, mass spectral analysis, and two-hybrid analysis. This, 
combined with the power of computers to analyze the massive amount 
of data, produces models of interacting networks, which better 
describe the workings of a cell (Fig, 5). 
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Figure 5, A network of 
protein-protein interactions 
in a yeast celt 



Schwikowski et al, A NETWORK OF PROTEIN-PROTEIN 
INTERACTIONS IN YEAST (2000). 
Courtesy of Nature Publishing Group. 
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Proteomes in Different Organisms 

Although scientists have sequenced dozens of genomes from 
organisms as diverse as viruses, bacteria, nematode, fruit fly, puffer 
fish, mouse, and human, we still don't know what uniquely 
characterizes each of these organisms. For example, both mouse and 
human genomes contain around 30,000 genes. How many of these 
genes do they share? Based on comparisons of the two genomes, 
ninety-nine percent of the genes are conserved in both species and are, 
thus, derived from a common evolutionary ancestor. The remaining 
one percent represents genes that evolved independently in mouse or 
human. If these two organisms share so many similar genes, how can 
they be so different? A simple example may help us to understand that 
the presence of a gene does not mean that the protein is expressed. 
Pigs produce cell surface proteins, which are modified by glycosylation 
to contain a sugar called galactose (GAL). Those GAL-proteins, present 
in pig blood vessels, are seen as foreign by the human immune system. 
This leads to the very rapid destruction of pig organs that have been 
transplanted into humans when a human organ was not available. 
Interestingly, humans lack GAL-proteins but still have the gene for 
making them; the gene is not expressed in humans. Therefore, the 
presence of a gene does not mean that it is expressed. In fact, every 
somatic cell in an organism shares the same genes; so, the differences 
between tissue types — say liver and heart — result from differences in 
gene expression. (See the Genes and Development unit.) 

Identification of proteins may provide the most useful information in 
determining the significant differences between species. How different 
are the proteins in even closely related organisms? With the 
development of proteomic techniques, scientists are beginning to 
tackle this difficult question. One answer is that very similar genes in 
two organisms may be expressed very differently. Dr. Svante Paabo of 
the Max Planck Institute for Evolutionary Anthropology analyzed the 
proteins from brains of human and chimps. (See the Genomics and 
Human Evolution units.) He found that many very similar genes 
produced much more protein in human brain cells than in chimp brain 
cells. In contrast, the same type of experiment done with blood or liver 
cells showed much less difference between human and chimp in the 
amount of protein produced. 

At a different level, there are some clear differences in protein 
composition between the cells of eukaryotes and those of the other 
kingdoms. One is that eukaryotes have many more long proteins, more 
proteins with regular secondary structure and less random globular 
structure, and more loop regions in their proteins. Certain conserved 
structural domains show up in proteins, but are used in a number of 
different pathways. While there are many protein homologues 
conserved across many different organisms, some proteins are unique 
to one organism. As more genomes and proteomes are characterized, 
comparative genomics and proteomics will allow scientists to further 
understand how organisms differ. 

Proteomics and Drug Discovery 

One of the most promising developments to come from the study of 
human genes and proteins has been the identification of potential 
new drugs for the treatment of disease. This relies on genome and 
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proteome information to identify proteins associated with a disease, 
which computer software can then use as targets for new drugs. For 
example, if a certain protein is implicated in a disease, the 3D structure 
of that protein provides the information a computer programs needs 
to design drugs to interfere with the action of the protein. A molecule 
that fits the active site of an enzyme, but cannot be released by the 
enzyme, will inactivate the enzyme. This is the basis of new drug- 
discovery tools, which aim to find new drugs to inactivate proteins 
involved in disease. As genetic differences among individuals are 
found, researchers will use these same techniques to develop 
personalized drugs that are more effective for the individual. 

Virtual ligand screening is a computer technique that attempts to fit 
millions of small molecules to the three-dimensional structure of a 
protein. The computer rates the quality of the fit to various sites in the 
protein, with the goal of either enhancing or disabling the function of 
the protein, depending on its function in the cell. A good example of 
this is the identification of new drugs to target and inactivate the HIV-1 
protease. The HIV-1 protease is an enzyme that cleaves a very large HIV 
protein into smaller, functional proteins. The virus cannot survive 
without this enzyme; therefore, it is one of the most effective protein 
targets for killing HIV (Fig. 6). 




Photo-illustration — Bergmann Graphics 



Because many proteins have multiple functions, it may be necessary to 
develop drugs for each function of a multitask protein. In addition, 
most proteins act as part of complexes and networks, which may also 
affect the way a protein acts in a cell. This may also affect the ability of 
drugs to disable the protein. Understanding the proteome, the 
structure and function of each protein, and the complexities of 
protein-protein interactions will be critical for developing the most 
effective diagnostic techniques and disease treatments in the future. 



Figure 6, In virtual ligand screening, 
the three-dimensional image of the 
protein is fed into a computer, which 
attempts to fit millions of small 
molecules to a targeted active site. 
Small molecules that bind well to the 
protein become good leads for 
potential new drugs. 
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Ethics and the Economics of 
Drug Discovery 

Drug discovery is simple compared to drug development, which 
requires testing the efficacy and the safety of new drugs through 
clinical trials. The time (twelve to fifteen years) and cost 
(approximately 800 million dollars) of drug development are 
significant economic factors that limit the number of new drugs that 
come to market; many approved drugs never recover the cost of their 
development. How do companies decide which promising new drugs 
to develop? Clearly, there must be very good evidence that the new 
drug will be effective. But that is not enough; companies also carefully 
consider the economics of each potential new drug. What is the size of 
the market for that new drug? How strong is the demand? How 
effective are current drugs and what are their costs? 

The harsh reality of these economics is that new drugs that may 
benefit only a few are unlikely to make it to clinical trials. Drugs that 
may benefit millions of people in developing countries too poor to pay 
for the new drug will also have a low priority for development. While 
AIDS, malaria, and tuberculosis affect countries that together contain 
ninety percent of the world's population, only about ten percent of 
the world's medical research funding is targeted at these diseases. 
Partnerships among government agencies, charitable organizations, 
and the pharmaceutical industry may allow companies to allocate 
some of their resources to developing drugs that will never recover 
their cost. In 2001 GlaxoSmithKline Biologicals, in partnership with the 
World Health Organization and the non-profit organization Program 
for Appropriate Technology in Health, began a program to develop a 
vaccine for childhood malaria. 

Many currently patented drugs could be manufactured in third world 
countries as generic versions. However, pharmaceutical companies have 
strongly opposed this practice, fearing that these generic drugs will be 
inferior to the name brands and would enter the U.S. and European 
markets at low prices. Brazil has registered generic versions of several 
AIDS drugs, and manufactures them for itself and other developing 
countries. In response to worldwide pressure, drug companies have 
agreed to sell some AIDS drugs at deep discounts to developing 
countries. However, even with the discounts, the price is much higher 
than the generic version, limiting the number of AIDS victims who can 
be treated in poorer nations. 
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Glossary 



2D gel electrophoresis. 

A technique for separating 
proteins to further identify and 
characterize them. Proteins are 
separated in the first dimension 
based on their isoelectric point, 
and then in the second dimension 
by molecular weight. 

Active site. The specific part of 
an enzyme that binds the 
substrate. 

Alternative splicing. A 

biological process in which introns 
are removed from RNA in 
different combinations to 
produce different mRNA 
molecules from one gene; 
sometimes called "RNA 
alternative splicing." 

Catalytic domain. The regions 
of a protein that interact to form 
the active or functional site of 
the protein. 

Domain. A discrete part of 
a protein that folds 
independently of the rest 
and has its own function. 

Domain shuffling. The creation 
of new proteins by bringing 
different domains together. 

Exon. The sequence of a gene 
that encodes a protein. Exons may 
be separated by introns. 

Glycosylation. The modification 
of a protein by adding sugar 
molecules to particular amino 
acids in the protein. 

High-throughput technology. 

Large-scale methods to purify, 
identify, and characterize DNA, 
RNA, proteins, and other 
molecules. These methods are 
usually automated, allowing 
rapid analysis of very large 
numbers of samples. 



Interaction domain. A discrete 
module of a protein that is 
involved in interactions with 
other proteins. 

Intron. The DNA sequence within 
a gene that interrupts the protein- 
coding sequence of a gene. It is 
transcribed into RNA but it is 
removed before the RNA is 
translated into protein. 

Isoelectric point. The pH at which 
the net charge of the protein is 
zero. Proteins are positively 
charged at pH values below their 
pi and negatively charged at pH 
values above their pi. 

Kinase. An enzyme that catalyzes 
the transfer of a phosphate group 
from ATP to another molecule, 
often a protein. 

Ligand. A molecule that binds to 
a protein, usually at a specific 
binding site. 

Mass spectrometry. A technique 
that separates proteins on their 
mass to charge ratio, allowing 
identification and quantitation of 
complex mixtures of proteins. 

Motif. A short region in a protein 
sequence that is conserved in 
many proteins. 

Nuclear magnetic resonance 
(NMR). A technique for 
determining the structure of 
molecules, which is based on the 
resonance of the nuclei of certain 
atoms when the molecule is 
placed in a strong magnetic field. 

Peptide mass mapping. 

A technique for identifying 
proteins by mass spectrometry; 
combined with a computer 
program that matches the 
information on each peptide's mass 
against the mass of theoretical, 
predicted peptides, based on 
known proteins in databases. 



Polyacrylamide gel 
electrophoresis (PAGE). 

A technique used to separate 
proteins in a gel matrix by 
their relative movement in 
an electric field. 

Phosphatase. An enzyme that 
removes a phosphate group from 
a molecule, such as a protein. 

Phosphorylation. The addition 
of a phosphate group to a 
molecule, such as a protein. 

Primary structure. The sequence 
of amino acids that makes up the 
polypeptide chain. 

Protein fingerprinting. The 

identification of the proteins in a 
sample by analytical techniques, 
such as gel electrophoresis and 
mass spectrometry. 

Protein sorting. The processes in 
which proteins synthesized in the 
cytosol are further modified and 
directed to the appropriate 
cellular location for their 
particular function. 

Proteome. The complete 
collection of proteins encoded by 
the genome of an organism. 

Quaternary structure. 

The association of two or more 
polypeptides into a larger protein 
structure. 

Secondary structure. 

The arrangement of the amino 
acids of a protein into a regular 
structure, such as an alpha-helix 
or a beta sheet. 

Tertiary structure. The folding 
of a polypeptide chain into a 
three-dimensional structure. 

[continues...] 
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Glossary [continued] 

Virtual ligand screening. 

A computer-based technology 
that simulates the interaction 
between proteins and small 
molecules to identify those that 
might be pharmaceutically active 
and useful as drugs. 

X-ray crystallography. 

A method for determining the 
structure of a molecule, such as a 
protein, based on the diffraction 
pattern resulting from focused X- 
ray radiation onto pure crystals of 
the molecule. 

Yeast two-hybrid system. 

A method used to identify 
protein-protein interactions. 
A protein of interest serves as 
the "bait" to fish for and bind 
to unknown proteins, called 
the "prey." 
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Evolution and 
Phylogenetics 

"Systems of classification are not hat racks, objectively 
presented to us by nature. They are dynamic theories 
developed by us to express particular views about the 
history of organisms. Evolution has provided a set of 
unique species ordered by differing degrees of genealogical 
relationship. Taxonomy the search for this natural order, 
is the fundamental science of history." Stephen J Gould 1 

Perhaps the most striking feature of life is its enormous diversity. There 
are more than one million described species of animals and plants, 
with many millions still left undescribed. (See the Biodiversity unit.) 
Aside from its sheer numerical diversity, organisms differ widely and 
along numerous dimensions — including morphological appearance, 
feeding habits, mating behaviors, and physiologies. In recent decades, 
scientists have also added molecular genetic differences to this list. 
Some groups of organisms are clearly more similar to some groups 
than to others. For instance, mallard ducks are more similar to black 
ducks than either is to herons. At the same time, some groups are very 
similar along one dimension, yet strikingly different in other respects. 
Based solely on flying ability, one would group bats and birds 
together; however, in most other respects, bats and birds are very 
dissimilar. How do biologists organize and classify biodiversity? 

In recent decades, methodological and technological advances have 
radically altered how biologists classify organisms and how they view 
the diversity of life. In addition, biologists are better able now to use 
classification schemes for diverse purposes, from examining how traits 
evolve to solving crimes. These advances have strengthened 
evolutionary biology as a theory: a theory in the scientific sense, 
meaning a "mature coherent body of interconnected statements, 
based on reasoning and evidence, that explains a variety of 
observations." 2 Molecular biology, genetics, development, behavior, 
epidemiology, ecology, conservation biology, and forensics are just a 
few of the many fields conceptually united by evolutionary theory. 

A Brief History of Classification 

Taxonomy, the practice of classifying biodiversity, has a venerable 
history. Although early natural historians did not recognize that the 
similarities and differences among organisms were consequences of 
evolutionary mechanisms, they still sought a means to organize 
biological diversity. In 1758 Carl Linne proposed a system that has 
dominated classification for centuries. Linne gave each species two 
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names, denoting genus and species (such as Homo sapiens). He then 
grouped genera into families, families into orders, orders into classes, 
classes into phyla, and phyla into kingdoms. Linne identified two 
kingdoms: Animalia (animals) and Plantae (plants). Biologists generally 
accepted the idea of evolution shortly after the publication of Darwin's 
The Origin of Species and, since Linne's classification system, they have 
described an immense number of species. Despite these facts, 
taxonomy changed little until the 1960s. 

The first major break from the Linnean model came from Thomas 
Whittaker. In 1969 Whittaker proposed a "five kingdom" system in which 
three kingdoms were added to the animals and plants: Monera (bacteria), 
Protista, and Fungi. Whittaker defined the kingdoms by a number of 
special characteristics. First, he specified whether the organisms possessed 
a true nucleus (eukaryotic) or not (prokaryotic). Because Monera are 
prokaryotic and virtually all are unicellar, they are distinct from the other 
four eukaryotic kingdoms. With few exceptions, the eukaryotic unicellular 
organisms were placed into the kingdom Protista. 

The three multicellular eukaryotic kingdoms distinguish themselves by 
the general manner in which they acquire food. Plants are autotrophs 
and use photosynthetic systems to capture energy from sunlight. 
Animals are heterotrophs and acquire nutrients by ingesting plants or 
other animals, and then digesting those materials. Fungi are also 
heterotrophs but, unlike animals, they generally break down large 
organic molecules in their environment by secreting enzymes. 
Unicellular organisms use a variety of modes of nutrition. (See the 
Microbal Diversity unit.) 

The five kingdoms system was certainly an advance over the previous 
system because it better captured the diversity of life. Three groups — 
bacteria, fungi, and protists — did not fit well into either the animal or 
plant category. Moreover, each of these three groups appeared to 
possess diversity comparable to that of animals or plants. Thus, the 
designation of each as a kingdom seemed fitting. 

In the years since Whittaker's system was developed, however, 
new evidence and new methods have shown that the five-kingdom 
system also fails to adequately capture what we now know about 
the diversity of life. Microbial biologists became aware of these 
limitations as they discovered unicellular organisms that appeared 
to be prokaryotic, but were extremely distinct in ultrastructure and 
other characteristics from the traditional bacteria. Some of these 
unusual prokaryotes lived in hot springs and other places where the 
temperatures were near, or even above, the boiling point of water 
(the thermophiles). Others, the extreme halophiles, were able to 
tolerate salt concentrations as high as five Molar, roughly ten times 
the concentration of seawater. (See the Microbal Diversity unit.) DNA 
sequence data also increasingly suggested that these prokaryotes 
were most unlike the traditional bacteria. 

The microbal evolutionist Carl Woese proposed a radical 
reorganization of the five kingdoms into three domains. (See the 
Microbial Diversity unit.) Starting in the 1980s Woese's scheme has 
been increasingly accepted by evolutionary biologists and is now the 
standard paradigm. In his classification system, Woese placed all four 
eukaryotic kingdoms into a single domain called Eukarya, also known 
as the eukaryotes. He then split the former kingdom of Monera into 
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the Eubacteria (bacteria) and the Archaea (archaebacteria) domains. 
Woese then placed most of the "unusual" prokarytes in the Archaea, 
leaving traditional bacteria in the Eubacteria. The Woese classification 
represents a demotion of the animals and plants as individual 
kingdoms. This is consistent with recent discoveries of more diversity 
among microbes than between animals and plants. 




Unlike Whittaker's five kingdoms system, Woese's three domains 
system organizes biodiversity by evolutionary relationships. After a 
discussion of the methodology of contemporary evolutionary 
classification, we will examine the methods Woese used and the 
justification for his system. 



Figure 1. The older five-kingdom 
tree of life, which has been replaced 
by Woese's three-domain tree. 



Cladistics and Classification 

Except for his last sentence where he used the word "evolved," Charles 
Darwin never mentioned "evolution" in The Origin of Species. Instead, 
he used the phrase "descent with modification." Evolutionary 
classification today is based on those two central features of evolution: 
groups of organisms descend from a common ancestor and, with the 
passage of time, acquire modifications. 

Cladistic analysis, also known as cladistics and phylogenetic systematics, 
is the main approach of classification used in contemporary 
evolutionary biology. The German taxonomist Willi Hennig developed 
cladistics in 1950, but his work was not widely known until it was 
translated into English in 1966. After scientists began using molecular 
data in classification, Hennig's cladistics became increasingly adopted. 

Cladistic analysis starts with the assumption that evolution is a 
branching process: ancestral species split into descendant species, and 
these relationships can be represented much like family trees represent 
genealogies. The "trees" obtained by such analyses are called 
phylogenies. These phylogenies should be viewed as testable 
hypotheses, subject to either confirmation or rejection depending on 
new evidence. Of course, hypotheses differ as to how much support 
they have. Some are so well supported (such as that humans share a 
closer common ancestor to chimpanzees than either share with lemurs) 
that they are exceedingly unlikely to be overturned. 
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In cladistic analysis, groups of organisms, known as taxa, are arranged 
into clades that are then nested into larger clades. The term "taxa" 
(singular "taxon") can be applied to groups of any size. Taxa that are 
each others' closest relatives are called sister taxa. Each clade should 
be monophyletic; that is, all members share a single common 
ancestor, and all descendants of that ancestor are included in the 
clade. In contrast, a polyphyletic group is one in which the members 
are derived from more than one common ancestor. What if all of a 
particular clade's members share a common ancestor but not all taxa 
that share that common ancestor are included in that group? Such a 
group is called paraphyletic. 

Taxonomists following cladistic analysis place taxa into clades based on 
the derived character states that the taxa share. For example, a wing is 
a character. The presence or absence of a wing would be alternative 
character states. Other features of a wing (such as its shape and size, 
and how it develops) could also be character states. Aside from the 
presumption that characters are independent of one another, any trait 
can be a character. In principle, there is no difference between the 
analysis of morphological and molecular characters. The characters 
used most often in molecular phylogenies are the nucleotide positions 
of the examined DNA molecule(s); thus, the character states are the 
actual nucleotides at that position. Shared, derived characteristics are 
known as synapomorphies. 

That taxonomists would classify taxa based on similarity makes sense. 
After all, like goes with like. But why would they consider only the 
derived shared character states? Why not consider all character states, 
including those that are primitive? The rationale is that the primitive 
characters do not reveal information about which groups share more 
recent common ancestors; the primitive character states would only 
contribute noise to the system. In classifying different groups of birds 
that all fly, whether they fly does not contribute information. In fact, 
in classifying flightless birds, considering the ancestral state (flighted) 
can actually distort the obtained phylogeny away from the true 
phylogeny. For these reasons, only synapomorphies (shared, derived 
character states) are considered in the analysis. In practice, taxonomists 
often have difficulty in distinguishing between which character states 
are primitive and which are derived. 

For what reasons can taxa share synapomorphies? One possibility is 
that they share a common ancestor. This is called homology. While 
cladistic analysis assumes that most synapomorphies will arise by 
homology, they can arise by other ways. One possibility is 
convergence: different lineages that do not share a recent common 
ancestor evolve to the same character state. An obvious example is 
that both bats and birds have wings; however, these were 
independently derived, most likely owing to similar selective forces. 
This example is obvious because so many other characters place bats 
closer to non-winged clades (other mammals) than to birds. Yet, less 
obvious cases can be resolved only after cladistic analysis. Another 
possible reason why non-homologous character states can be similar is 
a reversal in which mutation or selection causes the derived 
character state to revert to the ancestral state. 

How does cladistic analysis work, especially given the possibility of 
conflicting data generated by reversals and convergence? Taxonomists, 
like scientists in general, start with the principle of parsimony — that 
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Figure 2, Examples of monophyletic 
(top), polyphyletic (middle), and 
paraphyletic (bottom) trees. 
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the shortest, most simple, and direct path is most likely to be the 
correct one. In one commonly used method, parsimony analysis, the 

taxonomist searches for the most parsimonious tree; that is, the one 
that requires the fewest number of evolutionary transitions. Consider 
the example in Figure 3: three possible phylogenies exist. Based on the 
data given, for the top phylogeny to occur, we must postulate a total 
of nine evolutionary changes. The middle phylogeny requires 
postulating ten changes, and the lower phylogeny requires postulating 
eleven changes. Because the first phylogeny requires the fewest 
changes, it is the most parsimonious tree. 

The most parsimonious tree may not necessarily represent the true 
phylogenetic relationships. Perhaps certain types of transitions are 
more likely or evolved more easily than are others. It is often difficult 
to know before doing the analysis, which changes are most likely. 
Thus, taxonomists generally resort to the fallback position that all 
changes are equally likely. There are some cases, particularly with 
molecular data, where there is good prior knowledge of variation in 
the likelihoods of different changes. For instance, certain types of 
mutations are more likely than others are. Transitions (changes from a 
purine — A or G — to the other purine, or a pyrimidine — C or T — to 
the other pyrimidine) are more likely than transversions (changes from 
a purine to a pyrimidine or vice-versa). Using increasingly statistical 
techniques, such as maximum likelihood analysis, taxonomists can 
adjust for these situations. 

Figure 4 shows an example of an unrooted tree. Unrooted trees do 
not display the directionality of evolution, only patterns of relatedness. 
A unrooted tree can be rooted, but for any given unrooted tree there 
are many possible rooted trees that can be derived. Rooting a tree 
usually requires identification and use of an outgroup — a taxon that 
is more distantly related than the taxa contained within the tree. For 
instance, given an unrooted tree containing the great apes (humans, 
chimpanzees, gorillas, orangutans, and gibbons), one could use a 
species of monkeys, such as baboons, as an outgroup. (See the Human 
Evolution unit.) In practice, taxonomists often use multiple outgroups 
to refine the analyses. 
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Figure 3, Three possible unrooted 
trees are shown. The top tree 
assumes nine changes in character 
state occurred (each change is 
represented by a mark), the middle 
tree assumes ten changes, and the 
bottom tree assumes eleven. 
Because the top tree assumes the 
fewest changes, it is the most 
parsimonious tree. 



Figure 4, Panel A shows an unrooted tree. 
Panels B, C, D, and E should be the 
resulting rooted trees, when root is placed 
in each of the corresponding positions. 




Evolution and Phylogenetics 



Applications of Molecular Phylogenetics 

Although the methods used in cladistic analysis are the same for both 
molecular and morphological characters, molecular data provides 
several advantages. First, molecular data offers a large and essentially 
limitless set of characters. Each nucleotide position, in theory, can be 
considered a character and assumed independent. The DNA of any 
given organism has millions to billions of nucleotide positions. In 
addition, the large size of the genome makes it unlikely that natural 
selection will be strongly driving changes at any particular nucleotide. 
Instead, most nucleotide changes are "unseen" by natural selection, 
subject only to mutation and random genetic drift. If we were to 
assume that the driving force of natural selection is less prevalant for 
molecular characters, then we should assume that the probability of 
convergence for molecular characters is also. 

By selecting a particular class of morphological characters, researchers 
may also bias the analysis in such a way that groups with certain 
characteristics cluster with others for reasons other than homology. For 
instance, if the set of characters were weighted toward those involved 
in carnivory, carnivorous animals may cluster together — not because 
of homology but because of shared function. This problem would be 
less likely if using molecular characters. 

Another advantage of molecular data is that all known life is based on 
nucleic acids; thus, studies involving any type of taxa can use DNA 
sequence data. Some genes or regions of genes evolve quickly. These 
are most useful in studies of closely related taxa. Conversely, other 
genes (or regions) are slower to evolve. These are the most useful for 
studies of more distantly related organisms. At the extreme, some 
evolutionary related genes have been found in disparate organisms 
such as yeast and humans. Rates by which sections of DNA evolve are 
primarily determined by the extent of functional constraint. Genes and 
positions within genes that are the most useful generally evolve the 
slowest. This is because they are the least able to tolerate mutational 
change without substantially reducing the fitness of the individuals 
that harbor them. Many of these very conserved genes play a role in 
development. (See the Genes and Development unit.) 

Starting in the late 1970s Carl Woese took on an ambitious project — 
determining the relationships of all life, which resulted in the 
reorganization of the tree of life. To do this, Woese and his associates 
took advantage of a molecule that evolves extremely slowly — (rDNA) 
the DNA that encodes a small subunit of ribosomal RNA. They found 
that the sequences cluster in three groups corresponding to the 
eukaryotes (Eukarya), the archaea, and the eubacteria. We discussed 
these three domains earlier. 

The three-domains model was controversial for several reasons. First, 
the conclusions Woese drew were initially based on evidence from a 
single gene. Perhaps there was something unusual about the way that 
small subunit of rDNA evolved, his critics said. That controversy was 
easily solved by generating more data. Sequences from other genes 
that evolve slowly seemed to confirm the rationale for the three 
domains. A more fundamental problem was that Woese's tree was 
unrooted. If each domain represents a monophyletic group, three 
possibilities existed: (1) that the eubacteria and archaea are sister 
groups, with the eukaryotes branching off first; (2) that eubacteria and 



REDISCOVERING BIOLOGY 



Evolution and Phylogenetics 



eukaryotes are sister groups; or (3), that archaea and eukaryotes are 
sister groups. Woese himself suspected this third possibility. A fourth 
possibility was that the root of the tree lied within one of the domains 
and, therefore, the domain was not monophyletic. To root a tree, one 
generally requires an outgroup. But what is the outgroup to all known 
life? Rocks? 

Margaret Dayhoff proposed an ingenious solution to this rooting 
dilemma: using ancestral genes that are present in multiple copies in 
the same organism because of gene duplication. If there were such 
genes that had duplicated before the split among the three domains, 
these could be used as outgroups to root the tree of life. In 1989, many 
years after Dayhoffs suggestion, Naoyuki Iwabe and colleagues used 
this approach. 3 Organisms in all three domains have two distinct genes 
that code for the two subunits (alpha and beta) of the enzyme that 
hydrolyzes ATP to yield energy, ATPase. DNA sequence similarity 
strongly suggests that these two genes are derived from a gene 
duplication predating the divergence of the domains. The ATPase- 
alpha tree, using an ATPase-beta gene as an outgroup, showed that 
each of the domains was monophyletic, and that eukaryotes and 
archaea are sister groups. The same result was obtained when ATPase- 
beta was used as an outgroup to root the ATPase-alpha tree. Similar 
trees were obtained with other pairs of duplicated genes. In 
conclusion, Woese was right. 

HIV and Forensic Uses of Phylogenetics 

Phylogenetic methods have been used to solve practical problems, 
including determining the sources of infection from HIV. This retrovirus 
evolves at an extremely rapid rate, owing to its exceptionally high 
mutation rate. In fact, sequences of HIV genes taken from the same 
infected individual can be as different as sequences from some 
homologous genes in humans and birds. Its rapidity of evolution works 
to HIV's advantage as it wreaks havoc on the immune system. On the 
other hand, scientists can take advantage of that rapid evolution to 
study the relationships between HIV and other similar viruses. 

Researchers at the Centers for Disease Control and Prevention (CDC) 
used phylogenetic systematics of HIV for forensic purposes. During 
the early 1990s a Florida dentist was suspected of transmitting HIV 
to several of his patients. After the first case of probable 
transmission surfaced, the dentist wrote an open letter to his 
patients suggesting that they be tested for HIV. At least ten of the 
patients tested positive for HIV. However, a few of the infected 
individuals had other risk factors; therefore, there was the distinct 
possibility that they had not been infected by the dentist. The CDC 
researchers sequenced the HIV gp120 gene from several viral isolates 
taken from the dentist, his infected patients, and non-patients who 
were also infected. From the phylogeny constructed based on the 
HIV sequence data, they first denoted what they called the "dentist 
clade." This monophyletic group contained sequences from the HIV 
sequences collected from the dentist but not from the non-patients. 
Five of the patients had viral sequences that were contained in the 
dentist clade. These patients also lacked other risk factors. Thus, by 
strong inference, the CDC researchers determined that the dentist 
had infected these five patients. 
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There was some controversy over whether or not the dentist clade 
identified in the CDC study was reliable. Nucleotides in the HIV gp120 
gene do not evolve in same way as in other genes. Instead of 
transitions being universally more prevalent than transversions, as is 
the case in most genes, A to C transversions are more frequent than 
transitions of C to T. There was also concern about the types of 
algorithms used. To address these concerns, David Hill is, John 
Huelsenbeck, and Cliff Cunningham re-analyzed the data of the CDC 
study. They found that, under nearly all circumstances, the same dental 
clade was obtained. 4 Thus, the results were statistically reliable. 
Investigators are using similar studies to determine the source of the 
anthrax used in the attacks of October 2001. 



The Origin of Bats and Flight 

Molecular phylogenetics are often most useful when there is conflict 
among the phylogenies constructed with different morphological 
character data sets. For instance, molecular data have helped settle the 
question of whether bats are a monophyletic group — that is, whether 
they share a common ancestor not shared by non-bats. In the 1980s 
several morphological analyses challenged the traditional view that 
bats (order Chiroptera) were monophyletic. The studies proposed that 
the large fruit-eating Megachiroptera (megabats) were actually more 
closely related to primates than they were to the smaller insect-eating 
Microchiroptera (microbats). The studies based the megabat-primate 
grouping on synapomorphies that included features of the penis, 
brain, and limbs. The implication of this reclassification was that flight 
evolved more than once within mammals. 

Spurred by this controversy, several research groups performed 
cladistic analyses of bats using molecular data during the early 
1990s. For example, Loren Ammerman and David Hillis sequenced 
mitochondrial DNA sequences from many mammals, including two 
species of microbats, two species of megabats, a tree shrew, 
a primate, and several outgroups. From their data, the most 
parsimonious tree that assumed bat monophyly was ten steps 
shorter than the most parsimonious tree that assumed bats were not 
monophyletic. Statistical analysis showed that bat monophyly was 
significantly more parsimonious than the absence of bat monophyly. 



Figure 5, Alternative possibilities of 
bat phylogeny. Left: Bats form a 
monophyletic clade, in which flight 
evolved once in mammals. Alternately, 
right, bats are diphyletic, and flight 
evolved twice in mammals. 
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Other molecular phylogenetic studies, using a variety of different 
classes of genes, showed the same pattern of bat monophyly. These 
researchers also indicated that convergence is the most likely reason 
why some derived morphological character states seem to be shared 
by primates and bats. 5 

Other researchers raised the objection that these early molecular 
phylogenetic studies did not take into account biases in the way that 
sequences evolve. Specifically, the critics noted that both microbats and 
macrobats have DNA with a higher proportion of G's and C's than A's 
and T's. It is well known that organisms that have higher metabolic 
rates will have higher G-C content. Thus, the critics argued, perhaps 
the apparent monophyly of bats that was observed in the molecular 
studies is due to convergent evolution toward high G-C content and 
not homology. Using various methods, subsequent molecular 
phylogenetic studies took the bias in nucleotide changes into account. 
One simple method was to split the DNA sequences into A-T rich and 
G-C rich regions and do a separate analysis on each. Even after 
nucleotide sequence bias was discounted, the most parsimonious 
phylogenies still showed that all bats had a single common ancestor. 
This support for bats as a monophyletic group is also strong evidence 
for flight evolving only once in mammals. 

The monophyly of bats is an example where molecular data shored up 
the traditional phylogeny against challenges posed by some 
morphological characters. In contrast, there are also occasions where 
analysis of the molecular data provided an unexpected answer. One 
such example is the example of the evolutionary history of whales, 
which is discussed in detail in the video. 

Challenges 

There have been tremendous advances in comparative evolution 
brought on by the new methods of phylogenetic analysis and 
burgeoning amounts of DNA sequence data; however, the field is not 
without challenges and limitations. Some of these challenges are due 
to features of the organism and some are due to limitations of the 
tools we currently possess. 

One feature of the organism that presents a challenge is the horizontal 
transfer of genes across different species. In the standard mode of 
vertical transmission, genes are transmitted from parent to offspring 
(whether by sexual or asexual means). Genetic material can also be 
exchanged among different organisms, especially bacteria. This 
general type of transmission is called lateral (horizontal) gene 
transfer. One mode by which lateral gene transfer can occur is 
conjugation, whereby some bacteria exchange genes (plasmids or 
small parts of the bacterial chromosome) by physical contact. 
Bacteriophages can also mediate lateral gene transfer by cross- 
infection. Amazingly, these processes that result in lateral gene 
transfer can occur among bacteria that differ by as much as fifteen 
percent at the DNA sequence level. The implication of widespread and 
random lateral transfer of genes is that the genetic structure of 
bacteria can be mosaic — different genes or gene regions may have 
different histories. If lateral transfer is sufficiently pervasive, it could 
lead to the inability of constructing the true phylogeny for all bacteria. 
(See the Microbial Diversity unit.) 



Figure 5a, Photographs of an example 
of a megabat, the African fruit bat, and 
a microbat, the Mexican freetail bat. 




Courtesy of the Transvaal Museum, South Africa. 
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The most dramatic case of lateral gene transfer involving eukaryotes is 
the endosymbiotic origin of mitochondria. This view, championed by 
Lynn Margulis, speculates that these ATP-producing organelles were 
once free-living prokaryotes that were engulfed by a proto-eukaryote 
— an idea now strongly supported. The evidence includes similarities 
of ribosomal structure, sensitivity to antibiotics, and DNA sequences 
between mitochondria and prokaryotes. The major controversy is how 
and when this process occurred. Other eukaryotic organelles have been 
shown to probably have endosymbiotic origins. The conventional 
wisdom, however, is that lateral gene transfer involving eukaryotes 
was limited from these exceeding rare endosymbioic events. 

Recent evidence strongly suggests that lateral gene transfer involving 
eukaryotes may be more prevalent than once thought. In some DNA 
sequences, bacterial or archaeal sequences cluster in clades that are 
otherwise strictly eukaryotic. The extent to which lateral gene transfer 
among the kingdoms and within the eukaryotes has occurred is still a 
matter of controversy and inquiry. The implications for our ability to 
construct accurate phytogenies for these "deep" relationships are also 
controversial. There appears to be a continuum of the degree to which 
different genes transfer across distantly related taxa. Some researchers 
have argued that we may be able to get around the problem of lateral 
gene transfer by choosing genes that display very little — if any — 
horizontal gene transfer. 
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Figure 6, This view of early evolution 
suggests multiple primitive cells as 
ancestors to the three domains, and 
illustrates lateral gene transfer among 
early organisms. 









Another major challenge to comparative evolution is that the 
methodology of phylogenetic systematics is computationally 
extensive. The number of potential trees increases extremely quickly 
— faster than exponentially — as the number of taxa increases. For 
three taxa, there are only three possible rooted trees. For a given data 
set, one can readily determine by inspection which tree is the most 
parsimonious. Given seven taxa, it would be exceedingly painstaking 
for a person to search for the most parsimonious tree through the 
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10,395 rooted possibilities; however, a desktop computer with the 
correct software could search among all of these possibilities in a tiny 
fraction of a second. 

Increasing computing power alone will not solve this problem. At 
twenty taxa, the number of possible rooted trees exceeds 8 times 10 21 
— a number of similar magnitude to the total number of cells in all 
living human beings. Soon after this point, it becomes impractical for 
computers to search through all the possibilities to find the most 
parsimonious one. Given fifty taxa, it would take literally longer than 
the age of the universe to search through every single possible 
unrooted tree — even if computers were a million times faster than they 
are now. Therefore, phylogenetic systematics must employ methods 
other than searching every single possible tree when evaluating data 
sets that involve a large number of taxa. One method is to collapse taxa 
that are known (by other information) to be close relatives into a single 
taxon to make the analysis more feasible. Researchers have also used 
various searching approaches, sometimes called heuristics. This 
approach uses algorithms to identify regions of "tree space" that are 
likely to contain very parsimonious trees. These heuristic methods may 
not always identify the best tree, but they will identify trees that are 
nearly as parsimonious as the best tree most of the time. 

Coda: The Renaissance of 
Comparative Biology 

We are witnessing a renewal of interest in comparative approaches to 
studying function. Biology in the 1800s was almost entirely 
comparative. In the twentieth century we moved into a strongly 
reductionistic period of genetics, developmental biology, and 
physiology. This trend only intensified with the rise of molecular 
biology, particularly after the elucidation of the structure of DNA in 
1953. At that time, comparative biology was marginalized as just 
"natural history." At the turn of the twenty-first century comparative 
approaches have staged a strong comeback. In large part, this 
renaissance is due to the revolution in data gathering (particularly of 
DNA sequences) and the effort already devoted to establishing 
particular model systems. In contrast to the comparative biology of 
ninteenth century, today's comparative evolutionary biology rests on a 
strong foundation of functional genetics. 
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Further Reading 

Books 

Freeman, S., and J. C. Heron. 2001. Evolutionary analysis. 2d ed. Upper 
Sable River, NJ: Prentice Hall. 

An excellent inquiry-based college-level textbook on evolution. 

It is somewhat more accessible than Futuyma's textbook. 

Futuyma, D. J. 1998. Evolutionary biology. 3d ed. Sunderland, MA: 

Sinauer Press. 

This is perhaps the most comprehensive textbook on evolutionary 
biology. It also provides an excellent entry into the primary 
literature of evolutionary biology. 

Article 

Hill is, D. M, J. P. Huelsenbeck, and C. W. Cunningham. 1994. 
Application and accuracy of molecular phylogenies. Science 
264:671-77. 

A technical review of the state of phylogenetic systematics as of 

the middle 1990s. 
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Glossary. 



Clade. An organizational term 
used in cladistics to describe a 
group of related organisms being 
compared. 

Conjugation. Cell-to-cell contact 
in which DNA copied from a 
plasmid or chromosome is 
transferred to a recipient cell. It 
can contribute to lateral gene 
transfer when it occurs between 
distantly related bacteria. 

Convergence. The phenomenon 
where more distantly related 
lineages have similar features due 
to the operation of similar 
evolutionary forces. 

Eukarya. The domain of all 
eukaryotic organisms. Eukaryotes 
are single or multicellular 
organisms with cells that have a 
membrane-enclosed nucleus and 
usually other organelles. 

Homology (homologous). 

Similarity of genes or other 
features of organisms due to 
shared ancestry. 

Lateral gene transfer (Also 
referred to as horizontal gene 
transfer.) The transmission of 
genes directly between organisms, 
particularly bacteria, and not from 
parent to offspring. 

Monophyletic. A clade, or group, 
of organisms that includes every 
member of the group and its 
shared common ancestor. 

Outgroup. An unrelated group or 
organism used for the purpose of 
comparison. 



Paraphyletic. An incomplete 
clade of related organisms from a 
common ancestor. 

Parsimony analysis. A method 
used to create phylogenies of 
organisms based on the 
assumption that the evolution of 
characters occurs by the simplest 
(most parsimonious) path. 

Phylogeny. A tree-like diagram 
used to represent evolutionary 
relationships between species or 
groups. 

Polyphyletic. A clade containing 
related groups of organisms 
derived from several unrelated 
ancestors. 

Reversal. A phenomenon 
wherein mutation or selection 
causes a derived character state to 
revert back to the ancestral state. 

Rooted tree. A phylogeny in 
which the evolutionary ancestor 
is known. 

Sister taxa. The most closely 
related groups of organisms in a 
phylogeny. 

Synapomorphies. Derived 
character states that are shared by 
two or more taxa. 

Taxa. Groups or representatives of 
related organisms that are being 
compared; they can vary in 
hierarchical level (such as genus, 
family, order, and so on). 

Unrooted tree. A phylogeny in 
which the evolutionary ancestor is 
not known. 
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Microbial Diversity 



"If we look well find em... the microbes are there. 
They're these little packages of secrets that are waiting 

to be opened." Anna-Louise Reysenbach, PhD 

Introduction 

Microbes flourish. Inside your gut, in the mucky soil of a marsh, in 
Antarctic ice, in the hot springs of Yellowstone, in habitats seemingly 
incompatible with life, microbes flourish. 

They were present on Earth 3.5 to 4 billion years ago, and they've been 
evolving and expanding into new environments ever since. Replicating 
quickly, exchanging genetic material with each other and with other 
organisms, bacteria and archaea have become ubiquitous. 

Not only are they everywhere, but these tiny organisms also 
manipulate the environments in which they live. Their presence has 
driven the development of new ecosystems — some of which allowed 
for the evolution of more complex organisms. Without microbes, the 
recycling of essential nutrients on Earth would halt. Microbes 
communicate; some generate the signals for the formation of 
metabolically diverse communities. Some use sophisticated signaling to 
establish complex relationships with higher organisms. 

In this unit we will examine examples of the broad diversity of 
microorganisms and consider their roles in various ecosystems, both 
natural and man-made. We will also discuss some of the practical 
applications that derive from the wealth of metabolic diversity that 
microorganisms possess. 

Let's start at the beginning... three or four billion years ago. 

Microbes as the First Organisms 

No one knows for certain where life began. Hot springs and volcanic 
(hydrothermal) vents on the ocean floor, however, may represent the 
kinds of environments where cellular life began. Before the ozone 
layer formed, the surface of Earth was exposed to strong radiation. 
Thus, most of the Earth's earliest organisms probably developed 
beneath the terrestrial surface or in the oceans. It's likely that these 
early microbes adapted to the high temperatures associated with 
abundant volcanic activity. Geological turmoil resulted in the 
accumulation of carbon dioxide in the atmosphere. 

Sometime later, about 2 or 2.5 billion years ago, gaseous oxygen began 
to appear. Unlike the carbon dioxide, oxygen almost certainly came 
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about because of microbes. Microbes similar to today's cyanobacteria 
were present at this time. We know this based on the presence of 
stromatolites — fossilized microbial mats consisting of layers of 
filamentous prokaryotes — and trapped sediment that date back to 
that time. Stromatolite-forming bacteria obtain carbon from carbon 
dioxide and get their energy by photosynthesis, splitting water to 
generate oxygen gas in the process. These organisms brought the 
oxygen level in Earth's atmosphere to about ten percent of what it is 
today — enough to allow the evolution of oxygen-using organisms. 
Gaseous oxygen also contributed to the formation of the ozone layer, 
which blocks UV radiation. New terrestrial habitats were now open for 
an evolving diversity of microbes. 

The Diversity of Microbial Metabolism 

The diverse environments on Earth today present energy, and carbon 
and other nutrients in varying forms. They also vary with respect to 
temperature, acidity, and the availability of byproducts from other 
organisms. Microbes thrive in a vast array of these environments. 

Microorganisms vary with regard to the sources of energy they use for 
assembling macromolecules and other cellular components from 
smaller molecules. Phototrophs obtain their energy from light; 
chemotrophs use chemicals as energy sources. (Troph is derived from a 
Greek word meaning "to feed.") Many organisms use organic 
compounds as sources of energy; these are the chemoorganotrophs. 
In contrast, the chemolithotrophs use inorganic chemicals as 
energy sources. 

Microorganisms also vary with respect to the source of carbon they 
use. Autotrophs are able to build organic molecules from carbon 
dioxide. Heterotrophs, the "other feeders," obtain their carbon 
from organic compounds — amino acids, fatty acids, sugars, and so 
on — of autotrophs. 

These terms are often combined. So, a "photoautotroph" is an 
organism that, like plants, gets its energy from light and its carbon 
from C0 2 . Decomposers are often chemoheterotrophs; they may 
obtain energy and carbon from the same source. 
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So what metabolic classes might microbes found in a deep-sea 
hydrothermic vent fall within? The lack of sunlight makes them 
dependent on chemical energy; thus, they are chemotrophs. Carbon 
dioxide dissolved in the ocean is their source of carbon; they are 
autotrophs. Organic material from decomposing phototrophs is not 
abundant, so these organisms rely on inorganic sources for energy. 
They may use H 2 (present in magmatic gases), reduced sulfur 
compounds, or methane as a source of energy. They are also 
thermophiles, growing optimally at temperatures above 45°C. 
Thermophilic chemolithoautotrophs serve as primary producers, the 
first organisms in food chains that include animals such as tube worms 
and giant clams. 

Archaea and Bacteria 

As reviewed in "Evolution and Phylogenetics," living organisms can be 
grouped into three domains: the Archaea, the Bacteria, and the 
Eukarya. Members of Bacteria and Archaea are prokaryotes: single- 
celled organisms lacking true nuclei and other membrane-enclosed 
organelles. Bacteria and archaea, however, differ in cell wall 
characteristics and membrane lipid composition. They also differ in 
RNA polymerase structure and, therefore, protein synthesis. 

Many extremophiles (organisms that tolerate high or low 
temperature, high salinity, or extreme pH) fall within the Archaea. 
Some archaea, the extreme halophiles (salt lovers), tolerate salt 
concentrations as high as nearly ten times that of seawater. They 
have also been found thriving in the Great Salt Lake and the Dead 
Sea. Nevertheless, habitat alone does not differentiate the groups. 
Some bacteria grow at temperatures above 80°C, and some archaea 
have been found in environments not considered extreme. For 
example, methanogenic archaea live in anoxic sediments in marshes 
and are used in sewage treatment facilities. Another archaean, 
Methanobrevibacter smithii, lives and generates methane in the 
human colon. 
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The Universal Tree of Life 

Starting in the 1970s Carl Woese proposed that variation in the 
sequences of DNA encoding ribosomal RNA (rRNA) in different 
organisms would provide valuable information regarding evolutionary 
relatedness. rRNA is an integral part of ribosomal structure, so it is 
found in all organisms. After comparing the small variations between 
the genes for rRNA from many organisms, Woese suggested that the 
Archaea constitute a unique domain of life, a grouping broader than 
kingdom. The genomes of several members of the Archaea have been 
entirely sequenced and have been compared with the genomes of 
other organisms. Such studies confirm that Archaea constitute a 
separate group: These organisms contain hundreds of genes with no 
counterparts in Bacteria or Eukarya. Unexpectedly, ribosomal proteins 
from Archaea were found to be more similar to those of Eukarya than 
to Bacterial ribosomal proteins. So, Archaea and Eukarya seem more 
closely related than Bacteria and Eukarya. (See the Evolution and 
Phylogenetics unit.) 

Can we construct a tree illustrating the relatedness of the three 
domains, with one common ancestor for all life? Woese and his 
colleagues have argued — based on phylogenetic methodology and 
data from several genes — that there is a common ancestor. They 
further argue that Archaea and Eukarya are more closely related to 
each other, and that Bacteria diverged from the common ancestor first. 
(See the Evolution and Phylogenetics unit.) 

Other biologists have countered that the true universal tree of life may 
be more complicated than the picture that Woese and his colleagues 
presented. The complication is lateral gene transfer, where 
individuals exchange genes between one another. Although not 
generally exhibited in Eukarya, mechanisms for lateral gene transfer 
(also known as "horizontal gene transfer") are well known in Bacteria. 
Genes are exchanged between bacterial species by the action of viruses 
and by conjugation (cell-to-cell contact in which DNA copied from a 
plasmid or chromosome is transferred to a recipient cell). Under special 
conditions, some bacteria are known to take up "naked" DNA from 
the environment. 

Lateral gene transfer, if restricted to very similar organisms, would not 
pose a problem for constructing a universal tree of life. However, there 
is evidence that genes have been exchanged between very distant 
organisms. Eukarya acquired mitochondrial and chloroplast DNA from 
Bacteria. Nuclear genes in eukaryotes seem to be derived from Bacteria 
as well, not just from Archaea. Genes are also shared between Archaea 
and Bacteria. Twenty-four percent of the genome of the bacterium 
Thermotoga maritima contains archaen DNA. Similarly, the archaean 
Archaeoglobus fulgidus has numerous bacterial genes. Some scientists 
believe that a more diverse community of primitive cells gave rise to 
the three domains and that the notion of a single universal ancestor 
might be replaced. W. Ford Doolittle (Dalhousie University) has 
suggested that lateral gene transfer among early organisms has 
generated a "tree of life" which more closely resembles a shrub — 
with untreelike links (shared genes) connecting the branches (Fig. 3). 
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Figure 3, Proposed by W. Ford 
Doolittle, this view of early evolution 
suggests multiple primitive cells as 
ancestors to the three domains, and 
illustrates lateral gene transfer among 
early organisms. 



Studying Unculturable Microbes with PCR 

Imagine yourself on a team studying archaea at a deep-sea hydro- 
thermal vent at the Galapagos Rift (an area known for its hydrothermal 
activity). You've found a new microbe. What do you want to know 
about it? What metabolic class does the microbe fall within? Does it 
make certain proteins? How does it survive the volcanic heat? 
Traditionally, asking such questions involved growing microbes in the 
laboratory. Unfortunately, replicating the conditions in which many 
bacteria and archaea grow is very difficult. For this reason, only a small 
fraction (perhaps only as few as one percent) of the microorganisms in 
nature has been cultivated. To identify and compare unculturable 
organisms microbiologists have turned to molecular genetic techniques. 

Polymerase chain reaction (PCR) is one technique for studying 
organisms that cannot be grown in the laboratory. When only a small 
quantity of DNA is available from a particular source, PCR can be used 
to amplify that DNA and produce billions of copies of a designated 
gene-sized fragment. The technique has many applications, including 
the amplification of DNA from crime scenes, analysis of cancer genes, 
and identification of pathogens. When an environmental sample 
contains unculturable organisms, scientists can use PCR to generate 
copies of microbial genes suitable for comparison. 

To replicate DNA in vitro, PCR takes advantage of a special property of 
the molecule: the hydrogen bonds. These bonds, which bind the 
complementary strands of DNA together in a double helix, are broken 
at elevated temperatures (about 95°C). Each single-stranded piece of 
DNA (ssDNA) is then built upon to form a new, double-stranded 
molecule (dsDNA). To initiate this, short "primers" — specific ssDNA 
fragments called oligonucleotides — must anneal to complementary 
regions on the single-stranded DNA. Deoxynucleotides (A,T,G, and C) 
and DNA polymerase are added and, in a process called primer 
extension, the complementary copy of the ssDNA fragment is built. The 
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result is two double-stranded DNA molecules identical to the original. 
Repeating these steps thirty times can result in a 10 9 -fold amplification 
of the original molecule. 
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Figure4. 1) Double-stranded DNA in 
the sample is heated to generate single 
strands. 2) Sequence specific primers 
are added, which anneal to desired 
sites on the DNA. 3) Nucleotides and 
heat-tolerant DNA polymerase allow 
for primer extension at elevated 
temperatures. 4) The result is two new 
copies of double-stranded DNA. 
The process is repeated to generate 
multiple specific dsDNA molecules. 
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Careful thermal cycling is required for PCR to proceed. For the primers 
to anneal to the ssDNA fragments, the temperature is reduced to 
about 55°C. However, at this temperature the original complementary 
ssDNA fragments will begin to re-anneal with each other. A high 
concentration of primers, and the tendency of the shorter primer 
strands to anneal more readily, ensures primer binding. The 
temperature is then raised again to about 72°C for primer extension. 
Underscoring the importance of microbes, the thermophilic bacteria 
Thermus aquaticus is the major source of the heat-tolerant DNA 
polymerase, which catalyzes primer extension and facilitates PCR. 

In order to amplify a particular gene, specific primers, unique to that 
gene, are used. Two oligonucleotide primers (oligos) are constructed 
that flank a region of interest. One oligo will be complementary to a 
region on one strand of DNA, and the other oligo will be 
complementary to a region downstream on the homologous strand. 

Back home, after your trip to the deep-sea hydrothermal vent, you 
want to determine what genus of bacteria you have in hand. You can 
use PCR to amplify the gene for ribosomal RNA (the gene isolated 
and sequenced by Woese from so many organisms when he 
constructed his "Tree of Life"). Then, you can choose conserved regions 
of the rRNA gene for primers. With adequate DNA from PCR, you 
could sequence the gene and compare it with millions of known rRNA 
gene sequences using a computer database. (See the Genomics unit.) 

Alternately, you might want to ask if a microbe carries out a particular 
form of metabolism. Given the DNA sequence for a protein involved in 
a particular metabolic strategy — photosynthesis, for example — you 
could construct oligos so that the presence of that gene could be 
detected using PCR. 

How does your microbe withstand the high temperatures of its 
volcanic environment? This has been a question posed by researchers 
studying extreme thermophiles for some time. Indeed, organisms have 
been found that tolerate temperatures as high as 1 10°C. Some archaea 
produce unusually high concentrations of thermoprotective 
proteins (heat shock proteins), that are found in all cells. These 
proteins help refold partially denatured proteins. Other archaea 
produce unique proteins that help stabilize DNA. You could use PCR to 
detect the genes for such proteins in your samples. 

As the techniques of molecular genetics are applied to extreme 
environments we will come closer to understanding the wide variety of 
strategies that organisms use to survive on this planet... and perhaps 
on others. 

Microbes and the Carbon Cycle 

We have classified microorganisms, including archaea, based on their 
sources of energy and carbon. The cycling of carbon between carbon 
dioxide and organic compounds is of considerable ecological 
importance. In addition to eukaryotes (such as plants and algae), 
autotrophic bacteria (such as cyanobacteria) play an important role in 
the fixation of carbon dioxide into organic compounds. Consumers, in 
turn, use organic compounds and release carbon dioxide. 
Decomposition of plants and animals and their constituent organic 
compounds is carried out by a large number of bacteria and fungi. 
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What is taking place in a swamp where you see marsh gas bubbling up 
though the ooze? A carbon cycle, based on one-carbon compounds, is 
taking place in the sediments and overlaying water of such freshwater 
environments. The anoxic sediments harbor archaea, which produce 
methane as a byproduct of energy metabolism. The methane rises 
from the sediment and moves into the zone above it. This upper area 
contains enough oxygen to support methane oxidizers, bacteria that 
use the methane as a source of carbon as well as an energy source. 

Methane (CH 4 ) is a greenhouse gas and, according to international 
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agreement, its emissions are controlled. Although it is produced by 
burning fossil fuel, most enters the atmosphere because of microbial 
action. How can the latter be limited? One strategy is to drain rice 
paddies more often, limiting the action of methane producers. 
Another is to add a layer of soil to landfills to encourage methane- 
oxidizers. Such approaches to reducing this harmful greenhouse gas 
are under being studied. 
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Figure 5, Methanogens are intolerant 
to oxygen so they thrive in anoxic 
sediments. The methane they produce 
is a carbon and an energy source for 
methane oxidizers in overlaying water. 



Microbes and the Cycling of Nitrogen 

Nitrogen is an important part of proteins and nucleic acids. This vital 
nutrient is recycled from organic compounds to ammonia, ammonium 
ions, nitrite, nitrate, and nitrogen gas by a variety of processes, many 
of which depend on microbes. Different organisms prefer nitrogen in 
different forms. The accompanying figure illustrates nitrogen cycling 
(Fig, 6). Note that nitrification (the conversion of ammonium to 
nitrite and nitrate) in soil is carried out by only two genera of bacteria: 
Nitrosomonas and Nitrobacter. Denitrification — the loss of nitrate 
from soil to form gaseous nitrogen compounds (N 2 0, NO, and N 2 ), — is 
dependent on other kinds of bacteria. 
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Figure 6, Bacteria are key to the 

cycling of nitrogen in ecosystems. 
Different species are involved in 
decomposition and ammonification, 
nitrification, denitrification, and 
nitrogen fixation. 



Some prokaryotes are essential to the nitrogen cycle because of their 
role in nitrogen fixation, the conversion of nitrogen gas to ammonium 
ions. These ions can then be used to build amino acids. In aquatic 
environments, cyanobacteria are the most significant nitrogen fixers. In 
soil, some nitrogen-fixing bacteria are free-living, such as members of 
the genus Clostridium; others live in symbiotic relationship with 
leguminous plants (such as peas and clover). Symbionts, such as 
Rhizobium, may contribute ten times more nitrogen to soils than free- 
living bacteria. As we shall see, these symbionts develop intimate 
relationships with their host plants that require complex 
communications. 

Biofilms 

We have formed many of our ideas about bacteria by studying pure 
cultures — homogenous populations growing in broths. In the wild, 
however, microorganisms live alongside, in, or on other organisms and 
often produce proteins not apparent in the laboratory. Bacteria 
communicate chemically with their neighbors and respond to signals 
they receive. An understanding of communication among bacteria — 
including those within bacterial communities — is shaping medical 
treatments, strategies for bacterial control, and providing a new 
perspective of the interrelationships between species. 

One form of bacterial community is the biofilm. An example is the 
coating of bacteria on your teeth. Biofilms are "living veneers" 
composed of microcolonies of bacteria, surrounded by a gooey 
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extracellular matrix that the bacteria secrete. A network of water 
channels provides nutrients and efficiently removes waste products for 
the bacteria on the surface. Deeper down, cells rely on diffusion for 
nutrient delivery and waste removal. Oxygen concentrations vary 
within a biofilm; cells buried deeper can be oxygen deprived. This 
variation in environment means that members of a biofilm community, 
even genetically identical individuals, differ in their metabolic states. In 
fact, those buried deep within the film are effectively dormant. 

"There's a real transformation that takes place and the 
bacteria start acting like a community. . . a whole 
different organism. And there are significant 
differences in the level of expression of genes within 
the biofilm because of the different environments within 
the microcolonies. " Anne Camper, Center for Biofilm Engineering 

Biofilms of Pseudomonas aeruginosa in the lungs of cystic fibrosis 
patients can be life threatening. The thick mucus that this inherited 
disorder produces provides a suitable environment for an infection to 
become established. However, this is not a simple infection. The 
bacteria organize themselves into a biofilm and, as they do, some 
become less susceptible to antibiotics. For patients, the result is a 
prolonged infection that is very difficult to treat. 

Why do bacteria in biofilms survive much higher concentrations of 
antibiotics and disinfectants than free-living organisms? One reason 
involves the dormant bacteria in the biofilm. Many antibiotics — 
penicillin, for example — act only on actively growing cells. Cells that 
were dormant can serve to reestablish a biofilm once the antimicrobial 
is no longer present. 

Another mechanism for survival is the layered nature of a biofilm. The 
effectiveness of a disinfectant, such as bleach, is depleted as it acts on 
outer layers of the film; bacteria located in inner layers may survive. A 
third mechanism for survival involves the generation of proteins that 
provide antimicrobial resistance, such as enzymes that inactivate 
hydrogen peroxide. Some biofilms are able to manufacture larger 
quantities of such enzymes so they become more resistant than 
planktonic (free-floating) bacteria. 

Biofilm Formation and Bacterial 
Communication 

How do biofilms form? The formation of a biofilm requires 
coordinated chemical signaling between cells. Unless an adequate 
number of neighboring cells are present, the costs of biofilm 
production to an individual bacterium outweigh the benefits. Thus, a 
signaling process benefits the bacteria by allowing it to sense the 
presence of neighboring bacteria and respond to varying conditions. 
The process by which a bacterium does this is called quorum sensing. 

Quorum sensing uses signaling molecules, known as autoinducers. 
These are continuously produced by bacteria and can readily diffuse 
through the cell membrane. When elevated numbers of bacteria are 
present in an area, the concentration of autoinducers in the region will 
be higher. Autoinducer molecules (which include certain peptides and 



Figure 7. Bacterial cells enmeshed 
in extracellular matrix material, 
creating a biofilm. 
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compounds known as homoserine lactones) can interact with specific 
repressor or activator sequences in DNA. The presence or absence of 
the autoinducer thus controls the production of mRNA and, therefore, 
protein. These proteins are encoded by dozens of genes, including the 
genes for biofilm production. Laboratory strains of P. aeruginosa 
lacking the gene for a specific homoserine lactone will not develop 
into normal biofilms but pile up into a disorganized heap. 

From the bacteria's perspective, intracellular signaling has many 
advantages. Often, microbes produce antibiotics that inhibit the 
growth of competitive species. Intracellular signaling not only brings 
bacteria together in biofilms, it also regulates the coordinated delivery 
of high doses of these antibiotics from the denser bacterial population. 
It also helps bacteria coordinate the release of virulence factors (such 
as disease-causing toxins) to overcome animal or plant defenses. 
Signals between bacteria in close proximity, as in a biofilm, also seem 
to enhance bacterial mating and the acquisition of novel DNA by 
transformation, both of which increase bacterial diversity. 

Impact of Biofilms on Humans 

What is the impact of biofilms on humans? Most are benign, like the 
slippery coating on a rock in a stream, but others can cause serious 
problems. For example, biofilms contribute to corrosion in metal 
piping and can reduce the flow of fluids necessary for many industrial 
applications, including power generation. A particular concern is the 
contamination of medical devices such as urinary catheters, 
hemodialysis equipment, and medical and dental implants. Biofilms 
that develop on these devices can increase the risk of patient infection. 
The recognition that biofilm formation contributes to disease extends 
beyond the Pseudomonas infections suffered by cystic fibrosis patients. 
Tuberculosis, Legionnaire s disease, periodontal disease, and some 
infections of the middle ear are just a few examples of diseases that 
involve the formation of biofilms. The Centers for Disease Control and 
Prevention estimates that biofilms account for two-thirds of the 
bacterial infections that physicians encounter. 

Several strategies can be used for attacking biofilms. For example, one 
might interfere with the synthesis of the extracellular matrix that 
holds the film together. Scientists are investigating coating medical 
devices with chemicals that inhibit matrix formation. Another strategy 
involves inhibiting the adherence of biofilm cells to their substrate. 
Identifying chemicals that bind to cell surfaces, stopping the 
formation biofilms before they begin, is also an ongoing interest of 
researchers. Targeting the molecules that biofilm bacteria use to 
communicate is a third tactic. 

In 1995 Peter Steinberg of the University of New South Wales, 
Australia, realized that the fronds of a red algae growing in Botany 
Bay are rarely covered with biofilms. He determined that the algae 
produce substituted furanones, chemicals that resemble the acylated 
homoserine lactones necessary for bacterial communication. Evidently, 
the furanones bind to bacterial cells, thereby blocking the ability of 
the cells to receive the signals for quorum sensing. Although these 
compounds are too toxic for human use, similar compounds are being 
investigated for inhibiting the Pseudomonas biofilms that form in 
cystic fibrosis patients. 
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Communication Between Bacteria and 
Eukaryotes 

Bacteria also communicate with plants and animals. One striking 
example involves the Rhizobium bacterium which helps fix nitrogen for 
legumes (such as pea and clover plants). This bacterium colonizes root 
hairs in specialized nodules built by the plant. Before the plant and 
bacteria ever come into contact, they are communicating. The plant 
sends out chemical signals, known as flavonoids, which penetrate 
Rhizobium cells and stimulate a gene-activating protein. The protein 
then switches on bacterial genes so that other proteins, such as Nod 
factor, are produced. Nod then stimulates the plant to form nodules. 

Another example is the signaling between the luminous bacterium, 
Vibrio fischeri and its host, the squid Eupryman scolopes. These 
bacteria colonize a specialized light organ on the squid, providing 
camouflage. The squid is a nocturnal forager; luminescence from the 
bacteria erases the shadow that would normally be cast from above by 
the moon's rays. Quorum sensing molecules allow the bacteria to turn 
on light production only when the colony has reached adequate 
density. However, the bacteria do not just communicate with one 
another — their chemical signals spur maturation of the light organ. 
Hatchling squid raised in sterile seawater do not develop the pouch 
which that eventually houses the bacteria. 

Like the Dr. Doolittle of fiction, who had the remarkable ability to talk 
with animals, scientists of the future will be continuing studies into the 
language of microbes. 



Figure 8. The squid Euprymna 
scolopes (left) and its light organ 
(right). The luminous bacterium Vibrio 
fischeri colonizes the light organ, 
providing camouflage to the squid. 
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Microbes in Mines 

Pyrite (FeS 2 ), otherwise known as "fool's gold," may not look like lunch 
to you but it does to the chemolithotrophic bacteria Acidithiobacillus 
ferrooxidans (formerly Thiobacillus ferrooxidans). These bacteria 
extract energy from the oxidation of ferrous ions (Fe 2+ ) to ferric ions 
(Fe 3+ ). Pyrite is one of the most common forms of iron in nature, and is 
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very common in bituminous coals and in many ore bodies. When pyrite 
is exposed, as in a mining operation, it reacts with oxygen to generate 
ferrous ions, sulfate, and hydrogen ions. 

2FeS 2 + 70 2 + 2H 2 -» 2Fe 2+ + 4S0 4 2 - + 4H+ 

Ferrous ions — lunch! And, the hydrogen ions generated in this 
reaction do not faze A. ferrooxidans. This acidophile prefers a pH 
below 3.5. It is able maintain a relatively neutral internal pH by actively 
pumping protons between the cytoplasm and external environment 
against a steep pH gradient. 

Acid mine drainage, which causes serious ecological damage to rivers 
and lakes, is in part a result of the presence of A. ferrooxidans. The 
ferric ions generated by the bacteria are soluble in the acid 
environment and easily react with additional pyrite. 

FeS 2 + 14Fe 3+ + 8H 2 -» 15Fe 2+ + 2S0 4 2 - + 16H+ 

The additional acid that is formed from this reaction is just one of 
the resultant pollutants. The ferric ions (Fe 2+ ) that are generated 
precipitate in a complex mineral called jarosite [HFe 3 (SO) 4 ,(OH) 6 ]. 
The unsightly stains in mine drainages, called "yellow boy" by U.S. 
miners, are jarosite. 

Acid mine drainage and its associated pollution does not form unless 
pyrite is exposed to oxygen. Only upon mining does the initial 
reaction generating ferrous ions provide an environment in which 
A. ferrooxidans will thrive. 

To reduce toxic metal content in acid mine drainage, scientists are 
turning to sulfate-reducing bacteria, which occur naturally in anoxic 
soils. These bacteria use sulfate as an electron acceptor instead of 
oxygen in a form of metabolism known as anaerobic respiration. 
Hydrogen sulfide is generated in the process. At a bioremediation site 
in southeast Idaho Dan Kortansky and his colleagues set up a series of 
ponds separated by berms (embankments) of crushed limestone, straw, 
and manure. The goal was to convert the sulfate in the drainage to 
sulfide. This reacts with the dissolved metals to form metal sulfides, 
such as ferrous sulfide. The limestone in Kortansky's bioremediation 
system lowers the pH as metal-laden water passes through the berms 
and sulfate-reducing bacteria thrive. Results have been encouraging. 
Iron concentrations in the drainage at this site were reduced 65% and 
copper residues were reduced by nearly 100%. 

Microbial Leaching of Ores 

Pyrite is not the only mineral oxidized by A. ferrooxidans Metals, such 
as copper, are often present in ores as sulfides. A. ferrooxidans. can 
convert the sulfide chalcolite (Cu 2 S) to covellite (CuS) to obtain energy. 
Copper miners take advantage of this metabolic step during the 
microbial leaching of low-grade ores. Cu 2 S is insoluble but can be 
converted, by a series of steps (some of which involve the bacteria), to 
soluble Cu 2+ ions. Copper metal (C°) is then recovered when water, rich 
in copper ions, is passed over metallic iron in a long flume (Fe° + Cu 2+ 
-» Cu° + Fe 2+ ). 

In heap leaching, a dilute sulfuric acid solution is percolated through 
crushed low-grade ore that has been stacked onto an impervious pad. 
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The liquid coming out of the bottom of the pile, rich in copper ions, is 
collected and the metal is precipitated by contact with iron (as above). 
The liquid is then recycled by pumping it back over the pile. Three 
different oxidation reactions take place within the ore pile: 

1. Cu 2 S + 2 -> CuS + Cu 2+ + H 2 is accomplished by bacteria 

2. CuS + 2 -> Cu 2+ + S0 4 2_ is accomplished by both chemical 

and biological processes 

3. CuS + 8Fe3+ + 4H 2 -» 

Cu 2+ + 8Fe 2+ + S0 4 2_ + 8H+ is a chemical reaction 

The resultant Cu 2+ is recovered from the solution when it reacts with 
iron and the Fe 2+ , which enters the solution, is oxidized (again by A 
ferrooxidans) to Fe 3+ . After oxidation this solution is delivered once 
again to the ore heap. The last oxidation, dependent on the bacteria, 
provides the Fe 3+ that drives step 3. 

The mining industry has increased biological leaching techniques for 
various reasons, including environmental concerns related to smelting, 
the decline in the quality of ore reserves, and difficulties in processing. 
This new interest has motivated increased research. We now know 
that ore heaps contain a much wider range of organisms than 
previously thought. In fact, there is a succession of microbial 
populations that occurs during the leaching of sulfide minerals. 
Heterotrophic acidophiles belonging to the genera Acidiphilium 
and Acidocella are found frequently, often in close association with 
A. ferrooxidans. These heterotrophic species probably scavenge 
organic molecules that are metabolic byproducts of the 
chemolithotrophs. Perhaps this association is detrimental, or perhaps 
it helps A. ferrooxidans thrive by removing wastes. 

Research continues into the composition of bacterial communities that 
occur naturally in bioleaching activities. Because ore heaps get quite 
hot during bioleaching, scientists are also asking whether novel 
bacteria — perhaps thermophiles from Yellowstone or deep-sea vents 
— might be seeded onto heaps to provide more efficient biomining. 

Coda 

There are about 5,000 known species of prokaryotes, but scientists 
estimate that true diversity could range between 400,000 and 
4 million species. Each has adapted to its particular environment and 
each performs many roles. Some of these roles are essential to 
sustaining entire ecosystems. But what is a prokaryotic species? 
Microbes, which reproduce asexually, cannot be thought of in terms 
of reproductive isolation. The advent of molecular genetics has 
brought with it new approaches to defining the concept of species. 
Some bacteriologists are differentiating prokaryotic species based on 
their rRNA sequences. If organisms possess rRNA sequences that differ 
by more than a certain proportion (usually three percent), these 
bacteriologists consider them to be different species. As new 
molecular genetic approaches to the study of microbes are developed, 
scientists will find additional ways of describing the vast diversity of 
organisms that make up a parallel, albeit invisible, part of our world. 
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A review of findings on bacterial populations in 

hydrothermal vents and volcanic hotsprings. 



REDISCOVERING BIOLOGY 



Microbial Diversity 15 



Rediscovering Biology 



Glossary. 



Anaerobic respiration. 

A pathway of energy metabolism 
in which an alternate electron 
acceptor replaces oxygen. 
Organisms undergoing anaerobic 
respiration reduce nitrite (NO 2- ), 
nitrate (NO 3- ), or carbon dioxide 
(C0 2 ) instead of oxygen. 

Archaea. A domain of 
prokaryotic organisms that differ 
from bacteria. In contrast to 
bacteria, archaea lack cell wall 
peptidoglycan, contain histone- 
like proteins, and possess 
chemically distinct cell membrane 
phospholipids. 

Autoinducers. Molecules 
involved in quorum sensing that 
regulate mRNA production for 
specific genes in response to 
population density. 

Biofilm. A multilayered bacterial 
population embedded in a 
polysaccharide matrix and 
attached to some surface. 

Chemolithoautotroph. An 

organism that obtains energy 
from inorganic compounds and 
carbon from C0 2 . 

Chemoorganotroph. An 

organism that obtains energy 
from the oxidation of organic 
compounds. 

dsDNA. Double-stranded DNA. 
A DNA molecule in which two 
chains (backbones of alternating 
sugars and phosphates) are linked 
together by hydrogen bonding 
between complementary bases. 

Eukarya. The domain of all 
eukaryotic organisms. Eukaryotes 
are single or multicellular 
organisms with cells that have a 
membrane-enclosed nucleus and 
usually other organelles. 



Extremophiles. Organisms that 
thrive in what humans consider 
extreme conditions — very salty, 
hot, cold, acidic, or basic 
conditions — or at high pressure 
(such as in the depths of the sea). 

Lateral gene transfer. Also 
referred to as horizontal gene 
transfer. The transmission of genes 
directly between organisms, 
particularly bacteria, and not from 
parent to offspring. 

Methanogenic. 

Methane producing. 

Oligonucleotide (oligo). 

A short, single-stranded DNA 
molecule consisting of a defined 
sequence of nucleotides. Used to 
initiate DNA replication in PCR. 

Polymerase chain reaction 
(PCR). A technique that uses DNA 
polymerase to amplify the amount 
of DNA in a sample. 

Quorum sensing. A process 
by which a bacterium detects 
the density of other bacteria 
in an area. 

ssDNA. Single-stranded DNA. 
A DNA molecule consisting of only 
one chain of alternating sugars 
(deoxyribose) and phosphates. 

Stromatolites. Fossilized 
microbial mats consisting of layers 
of filamentous prokaryotes and 
trapped sediment. 

Thermophile. Organisms that 
grow optimally above 45°C. 
Hyperthermophiles grow 
optimally above 80°C. 

Thermoprotective proteins. 

Proteins that help bacteria survive 
heat. Some thermoprotective 
proteins help refold partially 
denatured proteins. Others help 
stabilize DNA. 
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"To comprehend the interactions between Homo sapiens 
and the vast and diverse microbial world, perspectives 
must be forged that meld such disparate fields such 
as medicine, environmentalism, public health, basic 
ecology, primate biology, human behavior, economic 
development, cultural anthropology, human rights law, 
entomology, parasitology, virology, bacteriology, 
evolutionary biology, and epidemology" l. Garrett 1 

During the mid-1 900s, most scientists and policy makers were shifting 
their attention away from infectious disease as vaccines made polio 
and several other diseases rare, at least in the developed world. 
Through an intense vaccination campaign, researchers at the World 
Health Organization (WHO) had eradicated smallpox from the world 
by the mid-1970s. Most people expected that the eradication of other 
diseases would follow. In the meantime, scientists had created a large 
array of antibiotics that could easily treat many of the great scourges 
of history, from leprosy to tuberculosis. Infectious diseases appeared to 
be on the way out. 

This optimistic picture has since changed. Legionnaire's disease, 
hantavirus, AIDS (acquired immunodeficiency syndrome), West Nile 
virus, and SARS (severe acute respiratory syndrome) have rocked the 
public health and scientific communities. New, drug-resistant strains 
of bacteria have appeared. Tuberculosis and other old diseases, once 
thought contained, are again a public health concern. In some of 
these cases the disease-causing agent was previously undescribed. 
For others a previously treatable pathogen somehow changed. In 
addition completely new threats emerged. Where had these new 
threats come from? 

Why Do Diseases Emerge? 

Many factors contribute to the emergence of disease; outbreaks of 
existing diseases or the emergence of new ones typically involve 
several of these factors acting simultaneously. Predicting and 
controlling emerging infection ultimately requires coming to terms 
with biocomplexity — the elaborate interrelationships between 
biological systems (including human social systems) and their 
physical environments. 
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Tablei. Factors that affect the emergence of disease (Smoiinski, etai.) 2 

Human behavior and demographics 

Microbial adaptation and change 

International travel and commerce 

Human susceptibility to infection 

Technology and industry 

Changing ecosystems 

Climate and weather 

Breakdown of public health measures 

Poverty and social inequality 

Economic development and land use 

War and famine 

Lack of political will 

Intent to harm 

The Human Body as an Ecosystem 

The human body is inhabited by billions of bacteria. In fact, we 
normally carry ten times more prokaryotic than eukaryotic cells. Our 
mouths alone are host to four hundred identified — and probably 
hundreds more unidentified — species of bacteria. Most bacteria are 
benign to their host, some even provide valuable services. For example, 
bacteria in the gut aid digestion and generate vitamins used by their 
human hosts. 

The bacteria we possess are an ecological community; thus, the 
principles of community ecology and evolution are vital in 
understanding how these bacteria (both the benign and the 
potentially harmful) live within us. Each bacterial species is adapted to 
the habitat and ecological niche it fills, existing in somewhat of an 
ecological balance. This balance helps thwart the invasion of 
pathogens, which must compete with resident bacteria for nutrients 
and space. Resident bacteria also produce antimicrobial proteins called 
bacteriocins, which inhibit the growth of related species. 

When the normal flora are disrupted, it shifts the mix of microbiota 
and can lead to disease. For example, the use of some broad-spectrum 
antibiotics can dramatically decrease the numbers of bacteria in the 
colon. In this situation Clostridium difficile, normally present only in 
low numbers, can overgrow. This bacterium produces toxins that cause 
potentially fatal damage to the lining of the colon. In the few 
individuals that normally harbor the microbe, normal levels of other 
bacteria keep C difficile numbers low. It is only when the balance is 
disrupted that such a "superinfection" occurs. 
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The Emergence of 
Antibiotic-Resistant Bacteria 

Today we face a growing medical crisis: the emergence of bacteria 
resistant to multiple antibiotics. Strains of at least three potentially 
fatal bacterial species are now resistant to all the drugs available for 
treatment. Enterococcus faecalis is generally a benign intestinal 
bacterium. In the elderly and individuals with compromised immune 
systems, however, it can be deadly if it gets in the wrong location. E. 
faecalis can infect heart valves and other organs, causing a deadly 
systemic disease. Strains of Pseudomonas aeruginosa (which causes skin 
infections and deadly septicemia) and Mycobacterium tuberculosis 
(the causative agent of tuberculosis) also evade available drugs. Death 
rates for tuberculosis have begun to rise, in part because of the 
evolution of these new strains. 

The wide use, and misuse, of antibiotics has encouraged new strains of 
pathogens to develop. For example, the widespread use of 
cephalosporin antibiotics has led to drug-resistant E. faecalis. The use 
of Vancomycin (a drug of last resort) has contributed to the 
development of VRE (vancomycin-resistant Enterococcus), which defies 
treatment. Antibiotic-resistant bacteria are generally not more potent 
and do not generate a more severe disease state; they are, however, 
more difficult to treat. Resistant bacteria proliferate when a 
population of microbes containing both susceptible and resistant 
bacteria are exposed to an antibiotic within the host: susceptible 
bacteria succumb and resistant bacteria proliferate. 

Public health officials urge people to complete the full course of 
antibiotic treatment. Why? Bacterial susceptibility to an antibiotic is 
often dose-dependent; an individual bacterium that is only somewhat 
resistant may survive at low drug concentrations. There will be 
selection for such more-resistant bacteria and these will eventually 
predominate the bacterial population. Thus, the failure of patients to 
complete a full course of treatment, or the use of less than therapeutic 
doses of antibiotic, can lead to resistant strains. The full course of 
treatment should be sufficient to wipe out all the pathogenic bacteria. 

An additional cause of the rise of antibiotic resistance is the use of 
antibiotics in animal feed. Humans may be exposed to such bacteria by 
handling intestinal contents of the animals, such as when butchering 
or preparing meats. Moreover, bacteria from livestock can get into our 
water systems. 

Mechanisms of Resistance 

Various adaptations provide bacteria with antibiotic resistance. 
Mutations in a target protein that affect binding of an antibiotic to 
that protein may confer resistance. If an antibiotic inhibits a metabolic 
pathway and an alternate one becomes available, resistance can occur. 
Some antibiotic-resistant bacteria make enzymes that destroy drugs; 
others alter pores in the cell membrane so an antibiotic can no longer 
enter. Some resistant strains have developed mechanisms for actively 
pumping antibiotics out of the bacterial cell. The genes for antibiotic 
resistance are sometimes found on plasmids. The transfer of these 
plasmids among bacteria facilitates the spread of antibiotic resistance 
within and between bacterial populations. 
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Microbial Adaptation and Change 

The evolution of novel microbes, including antibiotic-resistant strains, 
depends on diverse members of microbial populations that can thrive 
in new conditions. Microbes have incredible abilities to change their 
genetic make-up and evolve faster than their hosts do. Multiple 
mechanisms ensure the diversity that allows for expansion. 

The production of a single, novel gene product may be the key to 
bacterial survival; however, several gene products working together 
sometimes provide the advantage. Mutation generates new genes but, 
unlike higher eukaryotes, bacteria do not undergo sexual 
reproduction; the typical bacterium simply grows, replicates its DNA, 
and divides. Therefore, bacterial reproduction does not provide a 
mechanism for generating progeny with new combinations of genes. 
How, then, do bacteria obtain new gene assortments, some of which 
may provide survival? 

Lateral Gene Transfer 

Bacteria possess several methods for lateral gene transfer (also 
called horizontal gene transfer), the transmission of genes between 
individual cells. These mechanisms not only generate new gene 
assortments, they also help move genes throughout populations and 
from species to species. The methods include transformation, 
transduction, and conjugation. 

Transformation involves the uptake of "naked" DNA (DNA not 
incorporated into structures such as chromosomes) by competent 
bacterial cells (Fig. 1). Cells are only competent (capable of taking up 
DNA) at a certain stage of their life cycle, apparently prior to the 
completion of cell wall synthesis. Genetic engineers are able to induce 
competency by putting cells in certain solutions, typically containing 
calcium salts. At the entry site, endonucleases cut the DNA into 
fragments of 7,000-10,000 nucleotides, and the double-stranded DNA 
separates into single strands. The single-stranded DNA may recombine 
with the host's chromosome once inside the cell. This recombination 
replaces the gene in the host with a variant — albeit homologous — 
gene. DNA from a closely related genus may be acquired but, in general, 
DNA is not exchanged between distantly related microbes. Not all 
bacteria can become competent. While transformation occurs in nature, 
the extent to which it contributes to genetic diversity is not known. 

Transduction is another method for transferring genes from one 
bacterium to another; this time the transfer is mediated by 
bacteriophages (bacterial viruses, also called phages) (Fig. 2). A 
bacteriophage infection starts when the virus injects its DNA into a 
bacterial cell. The bacteriophage DNA may then direct the synthesis of 
new viral components assembled in the bacterium. Bacteriophage DNA 
is replicated and then packaged within the phage particles. Early in the 
infective cycle the phage encodes an enzyme that degrades the DNA of 
the host cell. Some of these fragments of bacterial DNA are packaged 
within the bacteriophage particles, taking the place of phage DNA. 
The phage can then break open (lyse) the cell. When released from the 
infected cell, a phage that contains bacterial genes can continue to 
infect a new bacterial cell, transferring the bacterial genes. Sometimes 
genes transferred in this manner become integrated into the genome 



Figure 1. 

Bacterial Transformation 

1- Naked DNA fragments from 
disintegrated cells in the area of a potential 
recipient cell. This cell must be of the correct 
genus and be in a state of competence, 
allowing the entry of the DNA fragments. 




Entry of naked DNA 
into competent cell 



« 




3. Recombination 

Some DNA Fragments replace 
(recombine with) original host cell 
DNA. The resultant recombinant 
cell will now express the foreign 
genes it has received and pass 
them on to all its offspring. 



DNA that has not 

recombined is broken 

down by enzymes. 
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of their new bacterial host by homologous recombination. Such 
transduced bacteria are not lysed because they do not contain 
adequate phage DNA for viral synthesis. Transduction occurs in a wide 
variety of bacteria and is a common mechanism of gene transfer. 



Phage 



,Empty phage coat remains 
on outside of bacterium. 



Phage DNA enters the cell. 




Figure 2. 

Transduction by bacteriophage 



Bacterial host #1 

When a phage infects a host cell, 
it may cause the degradation of 
host DNA into small fragments. 



Phage coat proteins are 
synthesized and phage 
DNA is replicated. 



During maturation of the virus 
particles, a few phage heads may 
envelop fragments of bacterial 
DNA instead of phage DNA. 
Only bacterial DNA is present 
in the transducing virions. 



The phage carrying the bacterial 
DNA infects another cell, 
transferring the bacterial DNA 
into the new cell. 



Bacterial host #2 

When this bacterial DNA is 
introduced into a new host cell, 
it can become integrated into the 
bacterial chromosome, thereby 
transferring several bacterial 
genes at one time. 



Bacteria multiply with new 
genetic material. 
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Some bacteriophages contribute to the virulence of bacterial 
infections. Certain phages can enter an alternate life cycle called 
lysogeny. In this cycle, all the virus's DNA becomes integrated into the 
genome of the host bacterium. The integrated phage, called a 
prophage, can confer new properties to the bacterium. For example, 
strains of Corynebacterium diptheriae, which have undergone 
lysogenic conversion, synthesize the toxin in diphtheria that damages 
human cells. Clostridium botulinum and Streptococcus pyogenes, when 
lysogenized by certain phages, also manufacture toxins responsible for 
illness, causing botulism and scarlet fever respectively. Strains lacking 
the prophage do not produce the damaging toxins. 

Conjugation is another means of gene transfer in many species of 
bacteria (Fig. 3). Cell-to-cell contact by a specialized appendage, known 
as the F-pilus (or sex pilus), allows a copy of an F- (fertility) plasmid to 
transfer to a cell that does not contain the plasmid. On rare occasions 
an F-plasmid may become integrated in the chromosome of its 
bacterial host, generating what is known as an Hfr (high frequency of 
recombination) cell. Such a cell can also direct the synthesis of a sex 
pilus. As the chromosome of the Hfr cell replicates it may begin to cross 
the pilus so that plasmid and chromosomal DNA transfers to the 
recipient cell. Such DNA may recombine with that of its new host, 
introducing new gene variants. Plasmids encoding antibiotic-resistance 
genes are passed throughout populations of bacteria, and between 
multiple species of bacteria by conjugation. 
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Lateral gene transfer is a potent evolutionary force that can create 
diversity within bacterial species (See the Microbial Diversity unit.) As 
genes for virulence factors and antibiotic resistance spread between 
and among bacterial populations, scientists are realizing how integral 
these mechanisms are to the emergence of novel pathogens. 



Figure 3. The F-pilus serves as a point 
of contact between a bacterium 
containing an F-plasmid (the "male") 
and a bacterium lacking the plasmid 
(the "female"). After the female cell is 
contacted the pilus retracts, pulling the 
cells together. The exact mechanism of 
DNA transfer from male to female is not 
known; it may be by a channel in the 
pilus or by a temporary fusion of the 
mating cells. 
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Transposons (transposable elements) are genes that can move 
("jump") from one DNA molecule to another in a cell, or from one 
location to another on the same DNA molecule. They can facilitate the 
transfer of genes, such as antibiotic-resistance genes, from the 
chromosome of a bacterium to a plasmid. They also can contribute to 
genetic diversity by causing mutations. 

The simplest type of transposon is an insertion sequence (IS). It is a 
sequence of DNA that encodes an enzyme called transposase, which 
enables the IS to move. The transposase gene is flanked on either 
side by fifteen to twenty-five base pairs, arranged as "inverted 
repeats." A composite transposon is composed of any gene 
sandwiched between two IS sequences; this entire unit will move. 

Travel, Demographics, and Susceptibility 

Bacteria move readily from person to person; global travel has 
contributed significantly to the dissemination of novel pathogens, 
including drug-resistant strains. Stuart Levy refers to antibiotics as 
"societal drugs." They not only affect the bacteria in a treated 
individual, but also produce long-lasting changes in the kinds and 
proportions of bacteria in the environment and in human populations 
at large. For example, the multidrug resistant Streptococcus 
pneumonia (a bacterium that causes pneumonia and meningitis) has 
migrated from Spain to the United Kingdom, the United States, and 
South Africa. 

Crowding also contributes to the dissemination of novel pathogens. 
Hospitals and nursing homes are particularly ideal environments for 
the exchange of microbes, including drug-resistant strains. Every year 
two million people acquire infections while hospitalized and 77,000 
people die. Healthy caregivers and visitors can be unwitting carriers, 
but the scenario is worsened by the compromised status of patients. 
Cancer treatments and other immunosuppressives, such as those used 
for transplant patients, contribute to the problem. So does HIV. But 
any procedure, such as surgery or catheterization, that breaches the 
protective barrier of the skin increases the risk of infection. In crowded 
cities, especially in third world countries where adequate sanitation 
may be lacking, microbes arrive by immigrants from diverse locations. 
These bacteria can spread rapidly, particularly when immunizations 
and health care are unavailable. 

New Technologies 

The evolution of new pathogens is not just a function of human- 
pathogen or human-human interactions. Sometimes people also 
unwittingly provide new environments where disease-causing 
organisms thrive. In the 1970s, for example, air-conditioning systems 
became widely available. A bacterium normally found in fresh water 
lakes, Legionella pneumophila, moved into the systems, gaining access 
to susceptible humans. The result was a previously unreported 
respiratory infection. 
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Animal Reservoirs 

Scientists have identified more than one hundred species of 
pathogenic bacteria that can infect both humans and animals. As you 
might imagine, zoonoses (diseases that can be transmitted to humans 
from other vertebrate hosts) are harder to eradicate. For example, 
Lyme disease is a zoonosis that has emerged, in part, because of 
human alteration of ecosystems. (See the Biodiversity unit.) A recent 
example of a probable zoonosis is SARS, which has been found in the 
civet cat and other animals. 

Influenza 

An average of about 36,000 Americans die each year as a result of 
influenza. The "Spanish Flu" of 1918-9 killed more people worldwide 
than did World War I. This disease involves the interaction of multiple 
animal hosts; however, the story is more complicated. Variation among 
influenza viruses occurs at the level of the hemagglutinin (HA) and 
neuraminidase (NA) spikes, which cover the viruses' outer envelopes. 
These proteins are important for the attachment, and eventual release, 
of the virus from host cells. In response to an infection the immune 
system mounts a response against these proteins. Nonetheless, an 
individual immune to one subtype of influenza may not be able to 
mount an immune response to a new subtype with modified 
hemagglutinin or neuraminidase. Genetic mutations, resulting from 
the change of one or more amino acids within HA or NA, are 
responsible for the recurrence of minor epidemics of influenza in 
two to three-year cycles. This is referred to as antigenic drift- 
More dramatic changes, called antigenic shifts, occur when multiple 
viruses cause coinfections in animal cells (Fig. 4). For example, aquatic 
birds serve as reservoirs for the influenza-A virus. Some, but not all, 
types of bird influenza can infect humans directly. Occasionally, a new 
form of the virus — a new human pathogen — arises when multiple 
viruses infect the same cell. The mixing vessel is often the pig, which 
can be infected by both the bird and human forms of the virus. 
Influenza is an RNA virus and its genome is oddly segmented. Genes 
for HA and NA are found among the eight distinct fragments of 
single-stranded RNA. If a pig cell is infected with viruses from two 
different sources, RNA segments might be exchanged. Such genetic 
exchange can dramatically change the nature of the spikes found on 
the newly derived virus. Major pandemics of influenza, including the 
1918 flu and the "Hong Kong" flu of 1968, have occurred immediately 
after antigenic shifts have taken place. Farms and markets where 
poultry, pigs, and humans come in close contact are considered 
important to the emergence of new subtypes of influenza. 

Lyme disease and influenza are just two examples of diseases that 
have emerged because of human contact with animal reservoirs. 
Understanding the epidemiology of other emerging infections, 
such as hantavirus and ebola, also depends on an understanding 
of animal hosts. 
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Figure 4. A new form of a virus can 
arise when multiple viruses infect the 
same animal cell. Segments of nucleic 
acid can be exchanged, resulting in a 
novel pathogen. 
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Insect Vectors 

Insects provide a system that can deliver pathogens directly to the 
bloodstream and are essential to the spread of some infections. From a 
pathogen's perspective, moving from host to host is essential to 
survival; yet, the skin presents a barrier. Wounds, burns, and catheters 
provide opportunities for entry for some pathogens, but insect-borne 
bacteria have an advantage. Still, arthropod-transmitted microbes 
must be able to survive in the arthropod's gut, proliferate, and then 
become positioned (such as in the insect's salivary gland) for delivery to 
the animal host. 

Malaria 

Members of the protozoal species Plasmodium, which cause malaria, 
have evolved a successful relationship with their arthropod vector, the 
Anopheles mosquito. Malaria is prevalent in areas where this mosquito 
thrives — in parts of Africa, Asia, and China. Three million people die 
every year from Plasmodium infections. Between 1950 and 1970, 
efforts to eradicate malaria involved the use of the insecticide DDT. 
Unfortunately, mosquitoes developed resistance to the spray. Now 
considered a reemerging disease, malaria incidence is on the rise as 
eradication programs failed and drug-resistant strains of the parasite 
have evolved. The complex life cycle of the parasite makes 
development of vaccines difficult, and efforts to reduce malaria by 
controlling its insect vector continue (Fig, 5). 




Figure 5, Sporozoites are delivered 
to the human bloodstream from the 
salivary gland of the Anopheles 
mosquito when the insect bites. 
In the liver, the sporozoites multiply 
and become merozoites. The 
merozoites enter red blood cells and 
become trophozoites. Red blood cells 
rupture and new merozoites, which 
have developed from the trophozoites, 
are released. Gametocytes (the sexual 
stage) are eventually produced. 
Gametocytes taken up by the mosquito 
in a blood meal fuse to form zygotes, 
which give rise to sporozoites. 
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Dengue 

Approximately eighty viruses depend on insects for transmission. The 
virus that causes dengue and dengue hemorrhagic fever has the 
broadest distribution, comparable to that of malaria. Approximately 
2.5 billion people live in areas at risk for dengue, and millions are 
afflicted each year. The fatality rate is about five percent, with most 
fatalities occurring in children and young adults. Transmitted by the 
mosquito Aedes aegypti, dengue, or "breakbone fever," causes a range 
of symptoms: nausea and weakness, severe bone and joint pain, and 
high fever. 

Four immunologically distinct types of the virus exist, so individuals can 
contract the disease four times during their lifetime. An infection with 
a second subtype of the virus may result in a severe hemorrhagic 
disease, involving leakage of blood or fluid from mucous membranes. 
The hemorrhage seems to involve an immune reaction, resulting from 
sensitization in a previous infection. A global pandemic of dengue 
began in southeast Asia after World War II. In the 1980s dengue 
hemorrhagic fever began a second expansion into Asia, with epidemics 
in Sri Lanka, India, the Maldive Islands, and, in 1994, Pakistan. During 
the 1980s epidemic dengue arose in China, Taiwan, and Africa. /Aedes 
aegypti and an alternate mosquito vector, Aedes albopictus, are 
present in the United States (Fig. 6). Two outbreaks of dengue were 
reported in Texas during the 1980s, which were associated with 
epidemics in northern Mexico. 




Figure 6, Distribution of the mosquito 
Aedes aegypti] the vector for 
dengue/dengue hemorrhagic fever. 
A mosquito eradication program 
administered by the Pan American 
Health Organization ended in 1970. 



The dramatic global emergence of dengue relates in part to the lack of 
effective mosquito control in afflicted countries. Often, deteriorating 
public health infrastructures are to blame. 
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Climate and Weather 

Arthropods, important in the spread of many diseases, are particularly 
sensitive to meteorological conditions. Anopheles mosquitoes, for 
example, only transmit malaria where temperatures routinely exceed 
60°F. Temperature influences the proliferation rate of the mosquito, as 
well as the maturation rate of the parasite within the insect. 
Mosquitoes live only a few weeks; warmer temperatures raise the odds 
that the parasites will mature in time for the insect to spread the 
protozoans to humans. 

Global climate change has already altered the species ranges of a 
number of animals and plants. (See the Biodiversity unit.) Further 
change may increase the range of the mosquito vectors that transmit 
disease. This could expose sixty percent of the world's population to 
malaria-carrying mosquitoes. (Forty-five percent of the human 
population now reside in a zone of potential malaria transmission.) In 
fact, malaria is reappearing in areas north and south of the tropics, 
including the Korean peninsula and areas of Europe. During the 1990s 
outbreaks of locally transmitted malaria occurred in Texas, Florida, 
Georgia, Michigan, New Jersey, New York, and Ontario. Although 
these incidents probably started with a traveler or stowaway mosquito, 
conditions were such that the infection could be transmitted to 
individuals who had not been traveling. 

Cholera and Global Climate Change 

Global climate change may also bring flooding. In addition to creating 
breeding grounds for insects, this could increase the incidence of 
water-borne diseases such as cholera. The bacterium Vibrio cholerae 
causes seasonal outbreaks of intestinal infection so severe that 
individuals can lose as much as twenty-two liters (six gallons) of fluid 
per day. The intestinal lining becomes shredded so that white flecks, 
resembling rice grains, are passed in feces. Without adequate fluid 
replacement, death can occur in hours. During a 1991 epidemic in 
Bangladesh 200,000 cases were counted in only three months. 

Historically, cholera (caused by V cholerae) has been a problem in 
coastal cities, especially those where the quality of the water supply is 
poor. In a 1849 groundbreaking study, John Snow mapped cholera 
deaths in London and realized that victims had been drinking from the 
same well. The association between cholera and contaminated water 
was established, and appropriate water treatment seemed to bring the 
threat under control. Yet, especially in areas where water treatment is 
unaffordable, cholera epidemics continue. 

Where does Vibrio cholerae go between epidemics? This question 
intrigued Rita Colwell and her associates. Surprisingly, they found the 
bacterium in Chesapeake Bay in a dormant, spore-like form that was 
difficult to culture in the laboratory. Colwell used antibodies, directed 
to a component of the bacteria's cell membrane, and was able to 
detect the dormant organism. In this form V cholerae survives in a 
range of habitats, including seawater, brackish water, rivers, and 
estuaries. Colwell also found that wherever tiny crustaceans known as 
copepods were abundant so were the bacteria, which cling to the 
copepod and colonize its gut. 



REDISCOVERING BIOLOGY 



t 



Emerging Infectious Diesease 12 



Understanding the reservoir for cholera may be important to 
unraveling the periodicity of epidemics. Colwell turned her attention 
to locations where cholera outbreaks were common, such as in 
Bangladesh. By reviewing data from satellite monitors, she noticed 
that seasonal peaks in sea-surface temperatures in the Bay of Bengal 
correlated with the number of cholera admissions in nearby hospitals. 
Similar correlations existed between sea-surface temperatures and 
South American cholera epidemics in the 1990s. It is possible that the 
rise in temperature raises sea-surface height, driving seawater into 
estuaries. Alternately, rising temperatures might provide the right set 
of environmental conditions to boost copepod populations, perhaps by 
increasing populations of the photosynthetic plankton, which 
copepods feed upon. In either case, recognizing the association 
between sea-surface temperature and cholera incidence may make 
epidemics easier to predict. The relationship between climate and 
epidemic also increases the concerns raised by global climate change. 

Climate and Hantavirus 

Weather patterns can also influence the numbers of vertebrate 
animals serving as reservoirs for human pathogens. In 1993, in the 
Four Corners area of the United States (where New Mexico, Arizona, 
Utah, and Colorado meet), researchers tracked an outbreak of 
pulmonary illness that killed half of those infected. The causative 
agent, hantavirus, was not a new threat but was endemic in the 
rodent population of the area. Researchers were able to find the 
deadly virus in mouse tissue archived years earlier. Hantavirus spreads 
to humans by rodent urine and droppings. During the mild, wet 
winter of 1993, pinon nuts, a favored food for the deer mouse, 
flourished. As rodent populations soared, the opportunities for 
mouse-human interactions increased. Native American legend 
describes an association between pinon nut abundance and illness. 
Scientists found an association between the periodic climate pattern 
El Nino-Southern Oscillation and outbreaks of hantavirus. 

Medical practices, the adaptability of microbes, global travel, 
crowding, human susceptibility, alternate vertebrate hosts, insect 
vectors, and climate are just some of the factors that influence the 
emergence of disease. In most cases the interplay between multiple 
factors must be understood. Not the least of these is deteriorating 
public health systems in many countries where substandard water and 
waste management continues. War and famine also set up conditions 
that lead to the emergence of disease and, especially in poor nations, 
the political impetus to implement prevention and control strategies 
is often lacking. 

Preventing and Controlling Emerging 
Infectious Disease 

The prevention and control of emerging infectious diseases requires a 
global perspective that accounts for biocomplexity, all the interrelated 
factors that contribute to the evolution and survival of infectious agents. 
Individuals from many disciplines — biologists, chemists, statisticians, 
atmospheric scientists, and ecologists — must work together. Effective 
surveillance is essential. Multiple control measures will often be 
appropriate. New genomic and proteomic techniques may provide not 
only more effective detection but also prevention by novel vaccines. 
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The effective interaction between public health officials and 
individuals from a variety of disciplines was exemplified during the 
West Nile virus outbreak that occurred in the New York City area 
during the summer of 1999. By mid-October forty-eight people had 
demonstrated an unusual illness characterized by fever, extreme 
muscle weakness, and pneumonia-like symptoms. Four had died. 
Encephalitis or meningitis was present in a few of the more serious 
cases. West Nile virus was identified as the causative agent using 
antibody-based tests and DNA comparisons. 

West Nile virus hails from Africa, Australia, and the Middle East, and 
had never been seen in the Western Hemisphere. At the time city 
wildlife officials and veterinarians at the nearby Bronx zoo were 
struggling with a peculiar infection among crows and the zoo's 
collection of exotic birds. Brain hemorrhages and heart lesions were 
observed in dissected birds; DNA analysis showed the presence of West 
Nile virus. The discovery of the virus in wild bird populations, which 
could potentially serve as a reservoir for human disease, spawned 
surveillance of birds throughout the United States. Concerns that 
migratory birds would spread the virus rose. Flocks of chickens were 
used to monitor viral spread. In the meantime New York City began 
spraying for mosquitoes. By September 2002 the virus had infected a 
woman in Los Angeles. Continued surveillance of the bird population 
and continuing communication between wildlife experts, public health 
workers, and others will be instrumental in curtailing this infection in 
the United States. 

Effective surveillance is a critical step in preventing the spread of 
emerging diseases. For example, the new influenza vaccine available 
each year is the result of constant vigilance. The World Health 
Organization and others identify the strains of influenza most likely to 
cause infection in the coming year and define the vaccine based on 
their findings. In this case, an understanding of the animal reservoirs of 
the disease is important to the surveillance effort. The emergence of 
novel strains is most likely where poultry, pigs, and humans come in 
close contact. As a result, monitoring is conducted where such 
conditions abound. 

It is often necessary to take multiple measures to control disease. In the 
case of malaria the first steps to prevention are as simple as the use of 
bed nets for reducing bites from mosquitoes and more frequent 
draining of flooded environments (such as rice fields) where 
mosquitoes thrive. In the end, DNA-based vaccines, founded on an 
understanding of the complex life cycle of the protozoal parasite, may 
be the answer. 

Daniel Carucci of the U.S. Naval Medical Research Center and others 
have identified various proteins that are expressed by the malarial 
parasite during different stages of its life cycle. Some of these proteins 
should be recognized as foreign by the immune system and might 
serve as vaccines. The goal is to stimulate the production of not only 
antibodies but also cellular immunity specific for various stages of the 
parasite. (See the HIV and AIDS unit for an introduction to the immune 
system.) Rather than injecting the proteins into individuals, Carucci is 
evaluating the use of DNA vaccines. Such vaccines usually comprise 
DNA, encoding the protein(s) of interest, adsorbed onto gold particles 
and injected with an air gun into muscle tissue. The expression of 
malarial proteins by recipient cells and the subsequent immune 
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response to the proteins is being evaluated. If successful, DNA-based 
vaccines might offer advantages over traditional vaccines; they are less 
expensive to prepare and easier to store than protein-based vaccines. 
However, DNA can serve as an immunogen itself; it is thought that 
diseases such as lupus result from an immune reaction to DNA. As 
vaccine development continues, the importance of traditional public 
health measures to prevent and treat malaria remains essential. 

The threat from established and evolving disease organisms remains 
with us. Given high reproductive rates and mechanisms for lateral gene 
transfer, microbes can adapt to and rapidly circumvent the best 
treatments scientists develop. We have seen how new diseases arise 
and spread when humans interact with each other or with the 
environment in new ways. The anthrax attacks in the fall of 2001 
remind us that the threat of bioterrorism continues. This ancient form 
of warfare dates as far back as 1346 when the Tartar army catapulted 
the bodies of plague victims into the city of Kaffa. 

The journalist Laurie Garrett has suggested that because human 
behavior influences the emergence of disease, we have significant 
control over our struggle with microbes. Certainly, our understanding 
of the factors that contribute to the evolution of new pathogens is 
continuing to increase, and experience with such outbreaks as West 
Nile have helped hone surveillance and control measures. However, the 
global nature of disease means that public health strategies must be 
global as well. 
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Glossary. 



Antigenic drift. Changes in one 
or more amino acids in proteins 
on the outer envelope of a virus 
such as influenza. Because 
individuals may not be immune to 
these modified viruses minor 
epidemics can result. 

Antigenic shift. Changes in 
proteins in the outer envelope of 
a virus, resulting from the 
reassortment of viral genes. Major 
epidemics of influenza occur after 
antigenic shifts have taken place 
because individuals are not 
immune to the substantially 
modified viruses. 

Bacteriocin. Proteins produced 
by some bacteria, which inhibit 
the growth of other strains of the 
same organism or related species. 
Genes for bacteriocins may reside 
on plasmids. 

Conjugation. Cell-to-cell contact 
in which DNA copied from a 
plasmid or chromosome is 
transferred to a recipient cell. It 
can contribute to lateral gene 
transfer when it occurs between 
distantly related bacteria. 

F-plasmid. A fertility plasmid, 
which contains genes that 
allow for conjugation of 
certain bacteria. 

Hemorrhagic disease. 

Diseases characterized by the 
leakage of blood or fluid from 
mucous membranes. 



Hfr (high frequency of 
recombination). A strain of 
bacteria in which an F-plasmid has 
become incorporated into the 
bacterial chromosome. 

Lateral gene transfer. Also 
referred to as horizontal gene 
transfer. The transmission of genes 
directly between organisms, 
particularly bacteria, and not from 
parent to offspring. 

Septicemia. The rapid 
proliferation of pathogens in 
the blood. 

Transduction. The movement 
of genetic material from one 
bacterium to another by means 
of a bacteriophage. 

Transformation. The uptake of 
"naked" DNA by a bacterium. 

Transposon. A DNA sequence 
that encodes various genes, 
including those that allow the 
sequence to jump to other 
positions within the DNA strand 
or to other strands of DNA. 

Zoonosis. A disease that can be 
transmitted from other vertebrate 
animals to humans. 
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HIV and AIDS 



"The human immunodeficiency virus (HIV) epidemic has 
spawned a scientific effort unprecedented in the history 
of infectious disease research. This effort has merged 
aspects of clinical research, basic molecular biology 
immunology cell biology epidemiology and mathematical 
modeling in ways that have not been seen before. The 
ever unfolding discoveries of novel aspects of HlV-host 
interaction have been accompanied by (and often have 
resulted from) novel interactions among researchers in 
the disparate disciplines." John Coffin 1 

In the late 1970s young homosexual men were dying from rare cancers 
and pneumonias caused by usually benign microbes. Such conditions, 
which result from failures of the immune system, became indicators of 
what is now called acquired immunodeficiency syndrome (AIDS). 
Although the causative virus, human immunodeficiency virus (HIV), 
was identified in 1983, there is still no cure for AIDS. In the years since, 
HIV has killed millions of men, women, and children from all economic 
classes, representing every race, from countries around the world. Each 
day in 2003, 15,000 more individuals became infected and 8,000 died. 

HIV remains a major problem for several reasons. The virus has an 
extraordinarily high mutation rate, such that an infected individual 
often harbors many variations. This high mutation rate allows HIV to 
easily evolve resistance to the drugs used to treat it. In addition, cells 
essential to a strong immune response harbor a virus that can lay 
latent for years. Thus, the development of treatments and vaccines 
depends not only on knowledge of the complex life cycle of the virus, 
but also on understanding the intricate choreography of the immune 
system. Controlling HIV will require more than the development of 
medicines and vaccines, however, because poverty and politics exclude 
millions from treatment. 

The Immune System 

Understanding the various components of the immune system and the 
complex signaling that takes place between immune cells is key to 
understanding HIV. Both non-specific and specific lines of defense help 
thwart the invasion of pathogens. Non-specific defenses act quickly 
and indiscriminately to exclude microbes from the body or actively kill 
intruders. Mechanical barriers — such as the mucus, hairs, and cilia in 
the respiratory tract, and the flow of urine through the urinary tract — 
are among these non-specific defenses. Skin oils and chemicals in 
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perspiration and gastric juices also serve as non-specific barriers. 
Mechanisms involving complex chemical signals such as fever and 
inflammation also act against a wide variety of pathogens. One non- 
specific defense involves phagocytes, a particular type of leukocyte 
(white blood cell), which act as cellular "Pac-Men," engulfing and 
digesting microbes or other irritants like dust and pollen. 

If invaders have breached the non-specific defenses, the immune 
system will use a variety of leukocytes to mount directed defenses 
against specific invaders. Lymphocytes bind and respond to specific 
foreign molecules (antigens). One subset of lymphocytes, the B cells, 
matures into antibody-secreting cells. Another subset of lymphocytes, 
the T cells, includes immune cells that directly kill cancerous or virally 
infected cells. Some subtypes of T cells serve a regulatory function, 
releasing chemical signals that can stimulate or suppress a variety of 
immune functions. Because HIV preferentially infects one of these 
regulatory T cells, the so-called helper T (T H ) cell, it can subvert and 
decimate the immune system, leading to AIDS. 



Table 1. Types of Leukocytes (white blood cells) 


Respond non-specifically 


Granulocytes 

(contain cytoplasmic 
granules) 


Basophils 


Important in inflammation and allergic responses 


Neutrophils 


Phagocytic; during inflammation they squeeze through 
capillaries to destroy microbes in tissue 


Eosinophils 


Phagocytic; elevated in allergy and in parasite infections 


Monocytes and 
macrophages 




Monocytes that leave the circulation then mature into highly 
phagocytic macrophages in tissue. "Fixed" macrophages stay in 
certain places, such as the lymph nodes or the lung. 



Interact with specific antigens 




Lymphocytes 


B cells 


Cells that provide immunity to antigens circulating in the blood, 
such as bacteria, toxins, and circulating viruses. B cells mature 
from stem cells in the bone marrow. Once they encounter antigen, 
B cells mature into plasma cells that secrete antibodies. 


Tcells* 


Cells that provide cellular immunity to antigens inside or 
associated with cells, such as cancer cells or cells infected with a 
virus. They also help clear infections caused by fungi and worms, 
and contribute to transplant rejection. They mature from stem cells 
in the thymus. 

Types of Tcells include: 

• T c - cytotoxic T cells; lyse cells expressing foreign antigens 

• T H - helper cells; secrete chemicals that enhance T c and 

B cell responses 

• T s - suppressor cells; reduce T c and B cell responses 

• T D - delayed hypersensitivity cells involved in certain 

allergic-like responses 



* T cells can be differentiated, in part, based on certain proteins on their surfaces. Helper T cells, which are often called 
T4 cells, express the CD4 protein. Cytotoxic T cells and suppressor T cells express CD8. Because HIV infects helper Tcells, 
the ratio of CD4 to CD8 cells is valuable for monitoring the course of infection. 
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The Central Role of Helper T Cells 

Helper T (T H ) cells are critical to coordinating the activity of the 
immune response. The chemical messages they secrete (cytokines) 
stimulate the non-specific immune response to continue, and 
strengthen and boost appropriate specific responses. Helper T cells 
have sometimes been called the "conductors" of the immune system 
because they coordinate activity like the conductor of a symphony. 
They have also been called the "generals" of the immune system 
because they call up troops of B cells, cytotoxic T cells, and other helper 
T cells to go into battle against invading pathogens (Fig. 1). 

Macrophages alert helper T cells to the presence of pathogens. These 
phagocytic macrophages engulf bacteria and viruses, and can display 
foreign antigens — the identifying proteins of the bacteria or viruses 
— on the surface of their cell membrane. Embedded within the 
macrophage cell membrane is a molecule produced by the human 
leukocyte antigen (HLA) complex. (See the Human Evolution unit.) The 
helper T cells bind simultaneously to the foreign antigen and the HLA 
molecule. Only T H cells with receptors that match those of the foreign 
antigen on the activated macrophage are able to bind and respond to 
the call to action. Once bound, the helper T cell proliferates to form a 
clone of cells, each capable of recognizing the same antigen. The 
members of the helper T clone, the generals, generate the chemical 
signals that call up the troops. 
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Figure 1. A specialized macrophage 
ingests foreign antigens and displays 
antigen fragments along with MHC (self) 
molecules on its surface. A helper T cell 
(T H ) with the appropriate receptor binds 
and responds by producing cytokines that 
stimulate antigen specific B cells, as well 
as specific cytotoxic T cells. 
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Some signals sent by helper T cells stimulate cytotoxic T cells (T c ). 
Cytotoxic T cells (also known as killer T cells) bind cells that have been 
altered, such as by viral infection; they avoid healthy cells. Surface 
antigens on the altered cell perform the binding. These antigens are 
specific to the offending agent, and match receptors in the membrane 
of the specific T c cell. In addition, the T c cell simultaneously binds an 
MHC molecule on the surface of the infected cell. Once bound by both 
the foreign antigen and the HLA molecule, the cytotoxic T cell secretes 
a chemical called "perforin," which destroys the offending cell (Fig, 2). 

Helper T cells also stimulate the production of antibodies. Chemical 
signals from helper T cells stimulate the production of B cells specific to 
an infecting pathogen, and then stimulate the B cells to differentiate 
into plasma cells. The plasma cells are factories for the production of 
antibodies, which are specific to given pathogens circulating in blood 
or lymph. Antibodies work by blocking the receptors that allow 
pathogens to attach to target cells, or by creating clumps of bacteria. 
Clumping makes the job of phagocytes easier, as they will more readily 
engulf bacteria in clumps. Bound antibodies sometimes serve as tags, 
called opsonins, enhancing phagocytosis. Antibody binding can also 
initiate a cascade of biochemical reactions, activating a set of chemicals 
known as complement. Activated complement components can form 
holes in bacterial membranes and enhance inflammation. 

Helper T cells are clearly critical to the operation of the immune 
system. If they are destroyed because of an HIV infection, the whole 
system is crippled. The immune system is described as having two 
"arms": the cellular arm, which depends on T cells to mediate attacks 
on virally infected or cancerous cells; and the humoral arm, which 
depends on antibodies to clear antigens circulating in blood and 
lymph. As an HIV infection progresses, destroying helper T cells, both 
arms of immunity are impaired. 
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Figure 2. Binding by both the 
antigen and an MHC molecule 
initiates the secretion of lytic 
enzymes by the cytotoxic T cell (T c ). 
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The Structure and Life Cycle of HIV 

How does HIV evade the immune system so efficiently? Why are so 
many variants of the virus found in a single patient? Understanding 
the structure and life cycle of the virus is key to answering these 
questions and essential to the design of effective treatments. 
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Figure 3, Shows the binding of HIV 
to a host cell. GP120 on the virus 
binds CD4 receptors on the host. 
A second coreceptor molecule on 
the host is also required for binding. 



HIV is an enveloped RNA virus: As HIV buds out of the host cell during 
replication, it acquires a phospholipid envelope. Protruding from the 
envelope are peg-like structures that the viral RNA encodes. Each peg 
consists of three or four gp41 glycoproteins (the stem), capped with 
three or four gp120 glycoproteins. Inside the envelope the bullet- 
shaped nucleocapsid of the virus is composed of protein and surrounds 
two single strands of RNA. Three enzymes important to the virus's life 
cycle — reverse transcriptase, integrase, and protease — are also 
within the nucleocapsid (Fig, 3). 

Although helper T cells seem to be the main target for HIV, other cells 
can become infected as well. These include monocytes and 
macrophages, which can hold large numbers of viruses within 
themselves without being killed. Some T cells harbor similar reservoirs 
of the virus. 

Entry of HIV into the host cell requires the binding of one or more 
gp120 molecules on the virus to CD4 molecules on the host cell's 
surface. Binding to a second receptor is also required. Ed Berger helped 
identify this coreceptor. As he compared his results with those of other 
researchers, it became clear that two different coreceptors are involved 
in the binding. One, CCR5, a chemokine receptor, serves as a 
coreceptor early in an infection. Another chemokine receptor (CXCR4) 
later serves as a coreceptor. That two coreceptors are involved is 
consistent with previous observations. Viruses isolated from individuals 
early in an infection, during the asymptomatic phase, will typically 
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infect macrophages in the laboratory, but not T cells (the viruses are M- 
tropic). Virus isolated from patients later in the infection, in the 
symptomatic phase, will infect T cells (the viruses are T-tropic). It seems 
that a shift takes place in the viral population during the progression 
of the infection so that new cellular receptors are used and different 
cells become infected. 

HIV is a member of the group of viruses known as retroviruses, which 
share a unique life cycle (Fig, 4). Once HIV binds to a host cell, the viral 
envelope fuses with the cell membrane, and the virus's RNA and 
enzymes enter the cytoplasm. HIV, like other retroviruses, contains an 
enzyme called reverse transcriptase. This allows the single-stranded 
RNA of the virus to be copied and double-stranded DNA (dsDNA) to 
be generated. The enzyme integrase then facilitates the integration of 
this viral DNA into the cellular chromosome. Provirus (HIV DNA) is 
replicated along with the chromosome when the cell divides. The 
integration of provirus into the host DNA provides the latency that 
enables the virus to evade host responses so effectively. 

Production of viral proteins and RNA takes place when the provirus is 
transcribed. Viral proteins are then assembled using the host cell's 
protein-making machinery. The virus's protease enzyme allows for the 
processing of newly translated polypeptides into the proteins, which 
are then ultimately assembled into viral particles. The virus eventually 
buds out of the cell. A cell infected with a retrovirus does not 
necessarily lyse the cell when viral replication takes place; rather, 
many viral particles can bud out of a cell over the course of time. 
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Figure 4. 1) Membranes of the virus 
and the host cell fuse, and viral RNA and 
reverse transcriptase enter the host's 
cytoplasm. 2) Reverse transcriptase 
allows viral RNA to be copied to DNA. 

3) Viral DNA is incorporated into the 
host chromosome as provirus. 

4) Transcription and translation of viral 
proteins: viral RNA becomes incorporated 

into viral particles and is transcribed 
as well. 5) Viral particles bud out of 
the host cell, acquiring an envelope 
in the process. 
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HIV Transmission 

HIV is transmitted principally in three ways: by sexual contact, by blood 
(through transfusion, blood products, or contaminated needles), or by 
passage from mother to child. Although homosexual contact remains a 
major source of HIV within the United States, "heterosexual 
transmission is the most important means of HIV spread worldwide 
today." 2 Treatment of blood products and donor screening has 
essentially eliminated the risk of HIV from contaminated blood 
products in developed countries, but its spread continues among 
intravenous drug users who share needles. In developing countries, 
contaminated blood and contaminated needles remain important 
means of infection. Thirteen to thirty-five percent of pregnant women 
infected with HIV will pass the infection on to their babies; 
transmission occurs in utero, as well as during birth. Breast milk from 
infected mothers has been shown to contain high levels of the virus 
also. HIV is not spread by the fecal-oral route; aerosols; insects; or 
casual contact, such as sharing household items or hugging. The risk to 
health care workers is primarily from direct inoculation by needle 
sticks. Although saliva can contain small quantities of the virus, the 
virus cannot be spread by kissing. 

Progression of HIV Infection 

Characteristically, an HIV infection can progress for eight to ten years 
before the clinical syndrome (AIDS) occurs. The long latent period of 
the virus has contributed to many of the problems relating to diagnosis 
and control. The basketball player Magic Johnson was still relatively 
healthy twelve years after he announced he had HIV. On the other 
hand, not all cases exhibit the long latent period, and abrupt 
progression to AIDS occurs. Many factors, including genetics, determine 
the speed at which the disease will progress in a given individual. 

The Centers for Disease Control and Prevention (CDC) has identified 
the stages of a typical HIV infection: Categories A, B, and C. In the first 
stage, Category A, it can be difficult to determine whether an 
individual is infected without performing a blood test. While at least 
half of infected individuals will develop a mononucleosis-like illness 
(headache, muscle ache, sore throat, fever, and swollen lymph nodes) 
within three weeks of exposure, some Category A individuals are 
asymptomatic. Moreover, the symptoms themselves can be the result of 
many different infections. The presence of a rash may help 
differentiate an HIV infection from other infections, but not all HIV- 
infected individuals get a rash. Most of these signs and symptoms 
subside, but swollen lymph glands and malaise can persist for years 
through Category A HIV. 
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Category A: Asymptomatic 
or chronic lymphadenopathy 



Category B: Symptomatic- 
early indications of immune failure 



Category C: 

AIDS indicator conditions 
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The number of virus particles circulating in the bloodstream is usually 
highest soon after exposure. At this point the CD4 cell population 
plunges (helper T cells are among the immune cells that express the 
CD4 receptor, which can be used as a marker for counting cell types). As 
antibodies to HIV appear the numbers of CD4 cells rise; however, CD4 
cell levels drop again as the infection progresses. This lowering of CD4 
cell levels typically happens slowly, over the course of years. Category C 
HIV (clinical AIDS) occurs once CD4 numbers have fallen substantially (to 
2007mm 5 from the normal level of 800-1200 cells/ mm 5 ). 

In the Category B stage indications of immune system failure begin. 
Persistent infections — such as yeast infections, shingles, diarrhea, and 
certain cancerous conditions of the cervix — are apparent. 

Category C is synonymous with AIDS. In this stage the opportunistic 
infections associated with AIDS appear. According to the CDC, twenty- 
six known clinical conditions affect people with AIDS; most are 
infections that do not usually affect healthy individuals. These include 
yeast infections of the esophagus, bronchi, and lungs; Pneumocystis 
pneumonia (a fungal infection); toxoplasmosis (caused by a protozoan 
that is spread by cats); Kaposi's sarcoma (a rare cancer of the skin 
caused by a virus); cytomegalovirus (CMV) infections; and tuberculosis. 
In addition, individuals who have been affected by HIV are more likely 
to become seriously ill or die than other members of the population 
during outbreaks of infections such as Cryptosporidium (a water-borne 
parasite) and coccidiomycosis (a dust-borne fungus). 



Figure 5. 

Typical Progression of HIV Infection & AIDS. 
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Cytomegalovirus (CMV) causes another opportunistic infection 
prevalent in AIDS patients. About eighty percent of people in the U.S. 
have antibodies to this virus, but infections in normal individuals often 
go undetected or seem like a mild case of mononucleosis. In the 
immunocompromised, however, CMV can cause life-threatening 
pneumonia or encephalitis. In AIDS patients CMV that has been latent 
can reactivate and sometimes cause retinitis, affecting eyesight. 

Tuberculosis (caused by Mycobacterium tuberculosis) has been on the 
rise in the wake of AIDS, such that some call it a coepidemic. M. 
tuberculosis causes a respiratory infection (formerly called 
consumption) that is spread by inhalation. As a result, unlike HIV, 
behavior modification is less likely to reduce one's chances of exposure. 
The bacteria, which have an unusually waxy cell wall, survive well in 
the environment. M. tuberculosis reproduces inside macrophages 
found in the lung, and stimulates the production of aggregates of 
immune cells and connective tissue, called tubercles. Viable organisms 
can be walled off within such structures for decades, only to become 
reactivated when a person becomes compromised. Most tuberculosis in 
AIDS patients results from reactivated infections. AIDS patients suffer 
not only from respiratory infection but also from disseminated 
tuberculosis, which can involve the lymphatic system, peritoneum, 
meninges, urogenital system, or digestive tract. Antibiotic-resistant 
mycobacteria are also contributing to the rise of tuberculosis, so that 
second- and third-line drugs must often be used. And because 
treatments are prolonged, lasting as long as a year, patients sometimes 
do not complete therapy appropriately. Mycobacteria other than M. 
tuberculosis, particularly M. avium-intracellulare (MAC), also affect 
AIDS patients. 

Why Do Some Individuals 
Never Get AIDS? 

Despite repeated exposure, some individuals never become infected 
with HIV. These individuals often have unusual helper T cells with a 
less-efficient variant of the coreceptor CCR5, which is necessary for 
viral entry into helper T cells. (See the Human Evolution unit.) 

There are also individuals who become infected, but do not progress to 
AIDS. These long-term survivors, or long-term non-progressors, include 
individuals who have been AIDS-free as long as eighteen years after 
infection. A variety of factors may be responsible; for example, 
infection with less-virulent viruses. Some long-term non-progressors 
seem to have CD8 cells, which are particularly adept at curtailing HIV 
infection. (In most AIDS patients CD8 cells become less active.) Several 
investigators, including Jay Levy (University of California, San 
Francisco), are evaluating the CD8 cells of long-term survivors to see of 
they secrete an antiviral protein or proteins that may act against HIV. 

Genetic Variation Among HIV 

There are five major subtypes of HIV, designated A through E. 
Different subtypes predominate in different geographical areas. For 
example, subtype B is more common in North America. In contrast, 
subtype C predominates in sub-Saharan Africa. Considerable variation 
within a given subtype also exists. In fact, any given individual infected 
with HIV will harbor multiple variants of the virus. HIV makes many 
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mistakes as is copies its viral RNA to the DNA that integrates into the 
host's chromosome. Because of its sloppy copying of reverse 
transcriptase, HIV's mutation rate is high, causing great variability. This 
large number of variants makes the virus more difficult to treat and 
hinders vaccine development. In addition, because of its rapid rate of 
evolution, even within a single individual, HIV can quickly evolve 
resistance to the drugs the individual is taking to combat the virus. 

Treatments Based on Understanding the 
Viral Life Cycle 

Treating viruses is always difficult because viruses use the translational 
machinery of the host cell. Most drugs that target the virus also 
damage the host. Drugs that can inhibit enzymes specific to the virus 
are, therefore, less likely to cause side effects in the host. 

Most common anti-HIV drugs block key steps in viral reproduction and 
uptake. Several anti-retroviral drugs work by interfering with reverse 
transcriptase, the key enzyme of retroviruses. These drugs, the reverse 
transcriptase inhibitors, act when cells first become infected. Included 
in this group are the nucleoside analogs, chemicals that are similar to 
one of the bases (adenine, cytosine, guanine, and thymine) that 
comprise DNA, but sufficiently different enough to block viral DNA 
synthesis. There are also non-nucleoside reverse transcriptase inhibitors 
that can bind to reverse transcriptase and, thus, block the production 
of viral DNA. Reverse transcriptase inhibitors have been remarkably 
successful in preventing the spread of HIV from an infected mother to 
her newborn: if a pregnant woman treated with AZT (a nucleoside 
analog) delivers her child by caesarian, the chances of the baby being 
infected can be reduced to one percent. 

Protease inhibitors, another major class of drugs, act later in the life 
cycle of the virus by inhibiting the protease enzyme. These drugs 
interfere with the cleavage of the viral polypeptide into functional 
viral enzymes. 

The evolution of HIV variants that are resistant to the more commonly 
used medications has become a major problem. In one study as many as 
thirty percent of HIV patients harbored resistant viruses. The virus 
mutates rapidly, and variants that are able to survive in the presence of 
drug — particularly when circulating levels of the drug are lower — 
rapidly take over the population. Patient adherence to drug regimens is 
critical to reducing the emergence of resistant viruses; even the timing 
of medication can be important. Unfortunately, given the side effects of 
current treatments, adherence is difficult. Protease inhibitors can cause 
nausea and diarrhea, and some of the nucleoside reverse transcriptase 
inhibitors can cause red or white blood cell levels to drop. Painful nerve 
damage and inflammation of the pancreas can also result. 

HAART 

Beginning in the mid-1990s, an increasing number of HIV-infected 
individuals began a drug regime called highly active antiretroviral 
therapy (HAART), a combination of three or more anti-HIV drugs 
taken at the same time. The simultaneous intake of multiple drugs, 
each targeting different aspects of the viral life cycle, circumvents the 
ability of the virus to mutate and become resistant to the drugs. 
Combined therapies, often called "cocktails," can knock virus back to 
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undetectable levels and improve patient health significantly. With the 
advent of HAART, deaths from HIV began to decline in the U.S. in 1997. 
Unfortunately, HAART has several long-term side effects including 
kidney, liver, and pancreatic problems; and changes in fat metabolism, 
which result in elevated cholesterol and triglyceride levels and an 
increased risk for strokes and heart attacks. In addition, some viruses 
have evolved resistance to HAART. Given these side effects, some 
physicians recommend that HAART be delayed until HIV-positive 
patients are exhibiting clear signs of AIDS. Still, HAART is often 
recommended in the first few weeks after exposure to bring the initial 
viral load down. 

The treatments described above are directed at the reduction of free 
virus: they work only against viruses that are actively produced. 
Because of the latent nature of the virus they are not cures. In 
addition, treatments are prolonged and may be necessary for a 
patient's entire life. A patient who stops treatment will typically have 
an increase in viral numbers. 

Also under investigation are treatments that take advantage of our 
understanding of the process of viral infection. "Fusion" or "entry 
inhibitors" block the proteins involved in viral uptake, such as CCR5. 
Integrase inhibitors affect the enzyme necessary for the integration of 
viral DNA into host DNA. Both have shown promise. 

Treatments Based on Understanding the 
Immune System 

Development of novel treatments for HIV also depends on an 
understanding of the choreography of chemical signals that regulate 
immune function. Because cellular immunity is key to clearing viral 
infections, increasing the T cell response is critical to clearing HIV. 
Interleukin 2 (IL-2) is a cytokine produced by T H cells that promotes the 
growth of other T cells. Recombinant IL-2, which has the same activity 
as the native protein, has been shown to increase CD4 cell numbers in 
individuals in the early stages of HIV infection. Viral numbers, though, 
do not seem to go down with this treatment alone. However, IL-2 
administered with HAART resulted in more individuals with 
undetectable viral loads when compared to treatment with HAART 
alone. One frustration with HIV treatments is the inability to affect 
cells that harbor provirus. IL-2 administered intermittently to patients 
with more advanced HIV could work to stimulate viral production and 
stimulate HIV specific immune responses. Such strategies are under 
investigation. 

Other treatments under consideration target virally infected cells. 
Some CD8 cells seem to secrete soluble factors that suppress HIV 
replication. Understanding how these factors work may help define 
new treatments. 

The Challenges of Vaccine Development 

Scientists have taken a number of approaches to the development of a 
vaccine for HIV, but the nature of the virus presents significant 
challenges. HIV infects only humans and chimpanzees. Evaluating 
vaccine effectiveness in the chimpanzee model is problematic for 
several reasons. Chimpanzees are scarce, expensive, and do not show 
signs of disease when infected. There are also ethical concerns raised 
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because chimpanzees are our closest evolutionary relatives. An 
alternative is the development of a monkey model using simian 
immunodeficiency virus (SIV) that has been genetically engineered to 
express HIV components. The downside to this approach is the 
difficulty of predicting what will happen when a vaccine that was 
developed using monkey models is administered to humans. 

The route of transmission of HIV also presents a challenge for vaccine 
producers. Typically, an individual is exposed to the virus at a mucosal 
surface where a particular type of antibody molecule, IgA, mediates 
immunity. The ideal vaccine should stimulate production of this type of 
antibody, not just the type found in the circulation (IgG). But even if a 
vaccine stimulates the production of the appropriate type of antibody, 
an increasing number of investigators are convinced that it may not be 
enough. Circulating antibodies cannot clear a latent virus, and infected 
cells seem to persist in the body for long periods. So it may be 
necessary to stimulate cellular as well as humoral immunity. Another 
challenge to vaccine production is the variety of viral subtypes. Because 
distinct HIV subtypes are more prevalent in certain locations, some 
scientists have asked whether HIV vaccines need to be developed 
specifically for certain geographical regions. Alternately, immune 
stimulation must be accomplished using an antigen, or antigens, 
common to all subtypes. 

Another major impediment to vaccine development is HIV's rapid 
mutation rate and the presence of multiple viral variants within a 
given individual. Traditional vaccines, such as those for childhood 
illnesses, consist of live attenuated (weakened) pathogens, dead 
pathogens, or parts of organisms. Attenuated HIV vaccines are not 
likely to be pursued because of the risk of infection — whole, killed 
HIV is a safer alternative. But, given the rapid mutation rate of the 
virus, many believe that a variant of the virus unaffected by the 
immune response would evolve quickly. 

Vaccines based on pieces of HIV are safer and easier to prepare. Many 
efforts have been directed to the production of recombinant HIV 
proteins that can serve as vaccines. For example, vaccines consisting of 
the gp120 surface protein, which is needed for virus to adhere to cells, 
could elicit an immune response, inhibiting viral adherence. 
Unfortunately, gp120 vaccines may not be successful: the site on gp120 
that binds CD4 and CCR5 is apparently buried in a molecular pocket, 
which is not blocked by antibody. 

The AIDS epidemic has spurred additional vaccine production 
strategies that use genetic engineering techniques. Many scientists are 
examining strategies for generating cellular and humoral immunity; 
for example, live non-pathogenic bacteria or viruses can be engineered 
to express HIV antigens. Researchers at Merck Corporation have 
inserted the gag gene, which encodes a viral core protein, into 
modified adenovirus. They hope that as cell-mediated immunity is 
mounted against adenovirus, the response will also target HIV-infected 
cells. The protein encoded by gag is among those found unchanged in 
most HIV variants; therefore, researchers hope that the vaccine could 
circumvent the genetic variability problem. 

The gag gene is also the basis of one of several DNA-based vaccines 
under investigation. Such vaccines contain "naked" DNA (not 
associated with chromosomes or other structures), which is injected 
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directly into muscle tissue. The expectation is that some of the DNA 
will be taken up and expressed by human cells. The immune response 
directed against these cells is hoped to carry over to HIV-infected cells. 
Some investigators are combining strategies; for example, Harriet 
Robinson and her colleagues at Emory University are trying DNA 
priming, followed by a booster of recombinant pox virus. 

Clinical Trials 

More than two dozen experimental HIV vaccines are being studied 
worldwide. For a given vaccine to be proven safe and effective it must 
pass through three stages of human testing. Phase I addresses safety 
and dosage, and involves the administration of the vaccine to dozens 
of people. Phase II examines efficacy, the ability of the vaccine to elicit 
an immune response, and involves hundreds of people. Phase III 
involves thousands of people who are followed for a long periods to 
establish that the vaccine is indeed protective. 

At the outset of the AIDS epidemic some scientists anticipated the 
availability of a vaccine in two or three years. More than twenty years 
into the epidemic, a vaccine is still down the road and few believe it 
will be available soon. The idea of distributing a less-than-perfect 
vaccine is controversial. Some believe protecting only a certain 
percentage of the population could limit the spread of the disease. 
Others believe an imperfect vaccine could provide a false sense of 
security such that individuals might increase risky behaviors. 

Social Obstacles to Controlling HIV 

Researchers have worked diligently and gained an unprecedented 
knowledge of the biology of HIV and its interaction with the immune 
system; yet, the AIDS pandemic will continue for years to come. 
Obstacles to AIDS prevention and control lie not only in the nature of 
the HIV virus but the very nature of human societies worldwide. 
Poverty and discrimination exclude those most in need from 
information and treatment. The control of HIV lies not only in biology 
but also in the social realm of basic human rights. 

AIDS is having the greatest impact in countries ridden with poverty, 
where public health infrastructures are already strained by drug- 
resistant malaria, tuberculosis, yellow fever, Rift Valley fever, and other 
infectious diseases. (See the Emerging Infectious Diseases unit.) Further, 
the presence of HIV amplifies epidemics of such pathogens. AIDS is the 
leading cause of death in Africa. In several African countries, more 
than twenty percent of the 15-49-year-old population is infected with 
HIV; in Botswana more than thirty percent of that age group is 
infected. Poverty excludes millions from treatment. Of the roughly 28 
million people infected with HIV in sub-Saharan Africa, only 36,000 
received drugs in 2002. In response to such statistics drug companies 
have reduced the cost of treatment to as little as $300-$400 per person 
in developing countries (treating one person costs at least $10,0000 or 
more annually in the U.S.) — but even that is too expensive. In 2001 
the United Nations launched the Global Fund to Fight HIV, Tuberculosis 
and Malaria. At the time Kofi Annan, U.N. secretary general, said it 
would take $7 billion to $10 billion each year to fight HIV/AIDS. As of 
2002 the fund, supported mainly by donor nations and philanthropists, 
had raised only $2 billion. 
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Poverty is just one obstacle to controlling HIV. Discrimination against 
particular groups has hindered education, diagnosis, and treatment. 
The lack of women's rights in some countries has thwarted 
educational efforts and contributed to the spread of the disease; so 
has prevailing customs regarding multiple sex partners. Access to basic 
education, information about HIV transmission, and the power to say 
no to unwanted sexual advances are as important as access to drugs. 
Funding for teacher training, education, and prevention materials has 
been inadequate. 

Governments in many countries have been hesitant to implement 
strong and coordinated AIDS prevention programs. Needle exchange 
programs for drug users, for example, have been shown in numerous 
studies to reduce the risk of HIV. Yet, in countries around the world, 
such programs remain politically unpopular. Condoms protect against 
transmission of the virus, but promotions of condom use are 
discouraged by many religious groups and governments. Children 
around the world are denied access to sex education, mostly for 
ideological reasons. 

By depleting the workforce, AIDS is destabilizing the economies of 
countries already grappling with poverty and political instability. As 
people in their twenties and thirties die, countries lose their workers, 
their teachers, and the parents of their children. Men who have gone 
to urban areas to work contract HIV, and then return home to give the 
disease to their wives. Much of the toll of AIDS in Africa is on the 
women and children, who are critical to maintaining the continent's 
agricultural economy. In many sub-Saharan countries women are 
considered their husband's property and have little access to independent 
income. As men are lost to AIDS their widows become dependent on 
others or turn to one of the few survival strategies, prostitution. So 
viral dissemination is amplified and, at the same time, urban and rural 
economies decline. By the year 201 1 there will be 40 million AIDS 
orphans in Africa. In some countries the illness and death of women in 
the childbearing years will result in a greatly reduced number of births 
in the next decades. 

In industrialized societies, those touched by AIDS have had 
considerable impact in fighting the spread of infection. Patients 
themselves have become active channels for the distribution of 
information and participate in policy-making and lobbying for 
funding. But HIV remains latent in those whose voices are repressed. In 
sub-Saharan Africa, for example, stigma still surrounds people infected 
with HIV, and silence impedes progress in controlling the epidemic. 

HIV is difficult to control because it exploits the immune system 
designed to stop it and other infections. Researchers are continuing to 
explore strategies to foil the virus, but treatments and vaccines are just 
two components to thwarting the epidemic. Policies that ensure 
accessibility to medications and information are critical. Education is 
key. AIDS forces us to talk about things we would rather leave unsaid. 
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Glossary. 



Antigen. A substance, often a 
protein or large polysaccharide, 
which is perceived as foreign by 
the body and stimulates an 
immune response. Components of 
microbes such as cell walls, 
flagella, toxins, and the coats of 
viruses can serve as antigens. 

CD4. A protein on the surface of 
certain leukocytes. Helper T cells 
are among the cell types that 
possess CD4. CD4 is involved in the 
binding of HIV to its host cells. 

Cellular immunity. The arm of 

the immune system directed 
toward antigens associated with 
cells, such as those expressed by 
virally infected cells or cancers. 
The T cells of cellular immunity 
respond specifically to such 
antigens. 

Chemokine. A chemical signal 
that attracts white blood cells to 
infected parts of the body. 

Chemokine receptor. A protein 
associated with the membranes of 
white blood cells that chemokines 
can attach to. 

Cytokine. A molecular signal that 
modulates immune response. 

dsDNA. Double-stranded DNA. A 
DNA molecule in which two chains 
(backbones of alternating sugars 
and phosphates) are linked 
together by hydrogen bonding 
between complementary bases. 

gp120. Glycoproteins with a 
molecular weight of 120,000 
daltons, which are part of the 
peg-like structures protruding 
from the surface of HIV. 



HAART (highly active 
antiretroviral therapy). A 
combination of three or more 
anti-HIV drugs used in the 
treatment of HIV. 

Humoral immunity. The arm of 

the immune system directed 
toward circulating antigens such 
as bacteria, toxins, and viruses 
that have not entered cells. 
Antibodies secreted by plasma 
cells mediate humoral immunity. 

Integrase. An HIV enzyme that 
facilitates the integration of viral 
DNA into the host cell's 
chromosome. 

Opsonin. A substance that, when 
bound to antigen, amplifies the 
normal phagocytic process. 

Protease. An enzyme that 
facilitates the cleavage of 
proteins. In HIV, an enzyme that 
allows for the processing of newly 
translated polypeptides into the 
proteins that will be assembled 
into viral particles. 

Provirus. Viral DNA that is 
incorporated into a host cell's 
chromosome. 

Reverse transcriptase. An 

enzyme derived from a retrovirus, 
which uses single-stranded RNA as 
a template for the production of 
double-stranded DNA. 
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Genes and Development 



"Animals that look nothing like each other develop by using 
much the same basic 'toolkit' of molecules and often in 
much the same ways." m. Palopoli and n. Patel 1 

Development poses some of the central questions of biology: How 
does a single cell become a complex multicellular organism like us? 
What role do our genes play in the processes of development? From 
the early decades of the twentieth century, geneticists knew about 
mutants that altered phenotypes because of the actions of various 
genes during development. In numerous cases biologists knew where 
on the chromosome the mutant gene was located and how the mutant 
allele was transmitted from parent to offspring. Nevertheless, the 
actual role the genes play in development remained a "black box" 
mystery until around 1980. 

Starting in the late 1970s geneticists figured out the details involved in 
the genetic control of development in model systems such as the 
fruitfly Drosophila melanogaster. They found that many of these 
developmental genes shared similar features. During the 1980s and 
1990s geneticists made an even more surprising discovery: the same 
principles, and often the same genes, involved in development in 
model organisms (such as fruit flies and zebrafish) are also involved in 
controlling development in most other animals, including humans. 

Differentiation and Genetic Cascades 

Development of a complex multicellular organism is more than just 
growth — we certainly do not look like gigantic fertilized eggs. 
Starting from a single cell, numerous specialized cell types emerge that 
differ in many ways: size, shape, longevity, biochemistry, and so on. 
What can account for this great diversity among cell types? What 
processes underlie this differentiation of a single cell into all the cell 
types of an adult individual? 

Is differentiation due to the loss of certain genes in some cell types? 
While there are some exceptional cases (for example, mature red blood 
cells lack nuclei), development, as a rule, is not due to particular cell 
types having different genes. With only a few exceptions, all the cells 
in your body contain the same DNA. Discoveries of adult stem cells 
show that some adult cells retain the potential to produce many, if not 
all, of the cell types in the organism. These cells can reverse the process 
of differentiation, reaching a state where their descendants can 
redifferentiate into all of the cell types. 
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If cells of an individual are genetically alike, how does differentiation 
occur? Recall that proteins, not DNA, carry out most cellular functions. 
(See the Proteins and Proteomics unit.) DNA serves a blueprint from 
which RNA is transcribed. Proteins come from the amino acid chains 
that are translated from the RNA. The levels of transcription and 
translation of a gene determine how much of that gene's protein will 
be present in the cell. Gene expression, which encompasses 
transcription and translation, is the general term to describe the 
processes in which DNA produces RNA and proteins. It can also include 
other factors, such as the rate at which RNA is degraded before it can 
be translated. Differential gene expression will result in varying 
concentrations and kinds of proteins in cells, causing them to look and 
function differently. This differential transcription and translation of 
genes ultimately allows for cellular differentiation. Thus, development 
is a program that regulates gene expression at the appropriate 
locations and times. 

How is it that, for a given cell type, certain subsets of genes are 
expressed and other genes are not expressed? As we will see later, the 
protein product that results from the expression of one gene can 
influence the expression of several other genes. In turn, the altered 
expression patterns of these genes can then influence the expression 
of an even larger number of genes. By this process, called a cascade, a 
change in one or a few genes can alter the expression patterns of 
numerous genes. 



The Details of Gene Expression 

What regulates gene expression? The general principles of eukaryotic 
gene regulation are now well known. Much regulation occurs during 
transcription as RNA is synthesized from the DNA template. This 
process is mediated by interactions between proteins and DNA and, 
sometimes, interactions between different proteins. Proteins called 
transcription factors bind to DNA sequences, known collectively as 
regulatory elements, located near the coding region of the gene in 
question (Fig, 1). When proteins bind to the regulatory elements, it 
alters the transcriptional machinery and, thus, the level of transcription 
can change. In some cases the binding of transcription factors to the 
regulatory elements causes transcription to increase (up-regulation); in 
other cases it causes transcription to decrease (down-regulation). 

The invention of microarray chips in the late 1990s enabled 
researchers to observe the expression patterns of thousands of genes 
at the same time. (See the Genomics unit.) Using these chips, 
researchers can compare the genomic expression patterns of different 
cell types (such as a neuron versus a liver cell), as well as examine the 
changes in these patterns that occur as an embryo develops. With the 
microarray assays, biologists found many previously undiscovered 
genes that play a role in development. By examining groups of genes 
that have correlated changes in their expression patterns, biologists 
have inferred groups of genes that may interact in developmental 
pathways. They then use other methods to determine whether the 
hypothetical pathways actually exist. 



Figure 1. The yellow ball represents a 
transcription factor binding to DNA in 
the nucleus to affect transcription and 
translation of new proteins. 
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Establishing the Gradient and 
Coordinate Genes 

Development is a process where the products of some genes turn other 
genes on or off. 

But how does the process start? Even before fertilization, development 
is occurring. We normally think of an egg as a storehouse of energy 
supply and nutrients that the embryo will use as it develops. While this 
is true, the egg also supplies information to establish a molecular 
coordinate system. This coordinate system provides a way to tell 
"which end is up"; in other words the location of the embryo's head is 
determined even before the egg is fertilized. 

Coordinate genes are named because they establish the primary 
coordinate system for what will become the embryo. One important 
example of a coordinate gene is bicoid, which is involved in 
establishing the anterior-posterior polarity in Drosophila. How does 
bicoid do this? To understand this process we need to first discuss how 
bicoid gets to the anterior part of the egg. Nurse cells surround the 
anterior region of the egg in Drosophila and other flies. Cytoplasmic 
bridges allow various substances — in this case mRNA from bicoid — to 
be transported from the nurse cells into the egg. The bicoid mRNA is 
then trapped by proteins produced by other genes. The result is a 
concentration gradient of bicoid mRNA: the anterior end has the 
highest concentration and the posterior end lacks it (Fig, 2). 
Translation of bicoid is inhibited until after fertilization, leading to a 
bicoid protein concentration gradient. 




DROSOPHILA EMBRYO WITH BICOID PROTEIN EXPRESSED. Courtesy of Nipam Patel, PhD. 

In addition to bicoid, other coordinate genes help establish an 
anterior-posterior polarity. Still other coordinate genes allow the 
establishment of a dorsal-ventral gradient. 

These coordinate genes, like bicoid, are sometimes called maternal 
effect genes. Maternal effect occurs when the phenotype of the 
individual is dependent on its mother's genotype, not its own. In cases 
of maternal effect, the transmission pattern of the alleles is the same 
as in standard Mendelian genetics but the action of the gene occurs a 
generation later. For example, consider a maternal effect gene where 
the mutant allele (m) is recessive to the wild-type allele. In the cross of 
homozygous, wild-type females to homozygous, mutant males, all the 



Figure 2, This is a 2-hour-old 
Drosophila embryo that shows the 
expression of the bicoid protein. The 
bicoid protein forms a gradient with the 
highest expression at the anterior end 
(left side in this photo) of the embryo. 
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F1 offspring are heterozygotes and appear normal. In the reciprocal 
cross, all of the F1 offspring are heterozygotes but have the mutant 
phenotype (Fig, 3). Although the F1 offspring are genotypically 
identical in the reciprocal crosses, they are phenotypically different. 
This is because phenotype is due to the action of the mother's 
genotype. Maternal effect is not the same thing as maternal 
inheritance — such as in mitochondria, where the genetic material is 
transmitted only across maternal lines. 

Responses to the 
Concentration Gradient 

Coordinate genes such as bicoid lay down the grand plan, so to 
speak, upon which the genes downstream will act. The pattern of 
the developing embryo arises as these downstream genes are 
activated or repressed. 

Like many of the other coordinate genes, bicoid encodes a 
transcription factor; thus, there is a concentration gradient of a 
transcription factor. The next genes in this developmental cascade, 
the "gap genes," possess binding sites for this transcription factor. 
Gap genes are so named because mutations in these genes can 
produce larvae with "gaps" (missing several segments). These genes 
differ in how many bicoid binding sites they have and, thus, vary in 
their sensitivity to this transcription factor. Some gap genes will 
become active at low concentrations of bicoid, while the activation 
of others will require higher concentrations. Due to the 
concentration gradient, different regions of the developing embryo 
will activate different gap genes. 

Unlike the coordinate genes, the gap genes are not maternal effect 
genes. The activities of the embryo's gap genes (and not those of the 
mother's genes) determine the phenotype. Gap genes also encode for 
transcription factors, and these affect the transcription of genes that 
further refine the patterning of the Drosophila embryo (Fig, 4). 



Figure 3, Reciprocal F1 crosses 
involving maternal effect genes can 
produce different phenotypes. 
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Figure 4, The cascade of 
developmental genes in segmentation 
in Drosophila. Maternal effect/ 
coordinate genes set the anterior- 
posterior axes. The embryo is 
subdivided into progressively smaller 
regions by the actions of each class of 
segmentation genes. 
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Homeotic Genes 

At the end of this cascade is a class of genes that have a long history 
among Drosophila researchers. Decades before Watson and Crick 
ascertained the structure of DNA, and even more decades before 
geneticists understood the principles of gene expression, biologists 
were using Drosophila melanogaster as a model system for studying 
the transmission of genetic traits from parent to offspring. Let's go 
back to 1915 at Columbia University: In a small laboratory, crowded 
with thousands of milk bottles containing stocks of the tiny fruitfly 
Drosophila melanogaster, Thomas Hunt Morgan, the father of 
Drosophila genetics, and his students worked. They examined this 
fruitfly, focusing on ones that looked different in their quest to find 
and map genes. 

One day Calvin Bridges, one of Morgan's graduate students, discovered 
a most unusual fly. One of the hallmark features of flies is that they 
have two wings; Diptera, the insect order to which flies belong, means 
"two wings." The fly Bridges found had one pair of normal wings and 
one pair of somewhat developed wings. Four wings! Bridges found 
that this "four wing" phenotype was a genetic mutation that mapped 
to the third chromosome. After closer inspection, Bridges noted that 
the third segment of the thorax in these flies looked a good deal like a 
normal, second segment of the thorax (where wings normally grow). 
He consequently named the gene associated with this mutant 
phenotype "bithorax." (Genes in Drosophila are traditionally named 
for their mutant phenotype, not for what they do in normal flies.) 

Drosophila geneticists would later find other, similar mutations. One, 
named ultrabithorax, caused the fly to form two, completely 
developed pairs of wings. Another seemingly different mutation 
(antennapedia) caused legs to grow where the fly's antennae should 
have been (Fig, 5). These mutant genes became referred to 
collectively as homeotic genes, named after homeosis. Homeosis, a 
term coined by William Bateson (a prominent zoologist and one of the 
early geneticists), refers to "cases in which structures belonging to one 
body segment were transformed in identity to those belonging to 
another segment" 2- Mutants in these genes appeared to change the 
characteristics of one segment of the fly into those of another 
segment. Interestingly, all of these genes would map very close 
together in two clusters on the third chromosome. 

Recall the cascade that led to these homeotic genes. The maternal 
effect coordinate genes laid down the anterior-posterior and dorsal- 
ventral gradients, which influenced the expression of genes further 
along the cascade. These genes turned other genes on or off and, as a 
result, formed the segmented pattern of the Drosophila embryo. The 
homeotic genes, having been turned on or off by genes above in the 
cascade, are also transcription factors. They influence the expression of 
numerous other genes and, by doing so, determine the identity of the 
segment they are in. Certain homeotic genes, such as bithorax, are 
expressed in what would become the thorax; other genes are 
expressed only in the head or abdomen. It's interesting to note that 
genes expressed in similar regions are also located near each other on 
the chromosome (Fig, 6). 



Figure 5, A scanning electron 
microscope image of a Drosophila fly 
with the antennapedia mutation. This 
mutation causes the fly to grow legs 
where it should grow antenna. 




Thomas Kaufman, PhD, MUTANT DROSOPHILA. 
Courtesy of Thomas Kaufman, PhD, University of Indiana. 
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Figure 6. Genes that are expressed at 
the anterior end of an animal are 
located at the more anterior region of 
the chromosome. Likewise, posteriorly 
expressed genes reside on the posterior 
end of the chromosome. This is 
referred to as spatial colinearity. 
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Cell Lineage Mapping and C elegans 

Drosophila melanogaster is not the only model organism for 
developmental genetic studies. Starting in the 1960s geneticists 
interested in developmental questions turned to a free-living soil 
nematode, Caenorhabditis elegans. This species, usually referred to as 
just C elegans, has several features that Drosophila and most other 
organisms don't have, which makes it attractive for developmental 
studies. Because embryos of this nematode are transparent, their cells 
can be observed easily and without much manipulation. The species 
also has a low number of cells. In fact, all normal individuals have the 
same number of cells: 959 somatic cells in the hermaphrodite and 1,031 
in the male. Unlike Drosophila and mammals, which have extensive cell 
movement during development, the cells of C elegans do not move 
very much during development. All of these features made C elegans 
an ideal organism to study cell lineage history, the ancestral- 
descendant relationship of cells. 

John Sulston and colleagues worked out the entire cell lineage history 
of C. elegans by 1983. Some cell lineage mutations alter the rate 
and/or timing of cell division. Others affect differentiation. One 
remarkable feature of C. elegans development is that seventeen 
percent of the cells generated during embryogenesis undergo 
programmed cell death, also called apoptosis. Normal development 
requires that certain cells die. There are several mutants in which the 
exact failure of cells to die has been tied to a phenotypic change. Many 
of the genes involved in programmed cell death in nematodes have 
counterparts in vertebrates that are also responsible for programmed 
cell death. Moreover, absence of proper cell death is a key feature of 
many cancers. (See the Cell Biology and Cancer unit.) 
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Fate Maps 

What Sulston and his colleagues did with tracing the entire cell lineage 
would be exceedingly difficult for the vast majority of organisms. Most 
multicellular organisms have far more cells than C elegans. Moreover, 
most don't have a transparent body or rather sedentary cells during 
development. Nevertheless, for several different kinds of organisms, 
researchers have been able to determine the type of tissue that cells in 
developing embryos will become; fate maps are diagrammatic 
representations of this (Fig, 7). 





Figure 7. 

Left: A photograph of an early stage 
blastula from the Xenopus laevis frog. 
Right: A representation of a fate map. 



Courtesy of Dr. Anna Philpott, Department 
of Oncology, Cambridge University. 
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Scientists have been able to create fate maps for several organisms, 
such as the sea urchin, since the early decades of the twentieth 
century. To construct fate maps researchers use various methods, 
including removing cells from embryos. If the adult that developed 
from these embryos is missing specific tissues, researchers infer that the 
removed cells would have become those missing tissues. Researchers 
can also use a variety of stains to trace cells in the developing embryo. 

Cell-Cell Communication and 
Signal Transduction 

Although development begins with a "master plan," initiated by 
coordinate genes and carried on through a series of genetic cascades, 
cells also communicate with one another to coordinate development. 
In addition, cell-cell communication is essential throughout the life of 
the organism. Indeed, many cancers are due in part to failures of 
normal cell-cell communication. (See the Cell Biology and Cancer unit.) 

There are some similarities between the way cells communicate and 
the way individual organisms communicate: in both cases there are 
signalers and receivers. Cell-cell communication, like many forms of 
communication between organisms, involves the transfer of 
information by using molecules between signalers and receivers. The 
signaling cell sends out molecules called ligands; these can be proteins 
or small molecules such as vitamin D. Ligands attach to proteins 
embedded in the membrane of the receiver cell; these proteins are 
sometimes called receptor proteins. 

Once the receptor protein receives the message (the ligand), the 
nucleus still needs to receive the information because that's where 
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transcription occurs. How does that happen? Most often, the binding 
of the ligand causes the receptor protein to change its conformation. 
This conformational change sets up a series of changes, and sometimes 
cascades, which eventually lead to changes in transcriptional activity of 
genes. 

One example of a signaling pathway involves the "hedgehog" gene in 
Drosophila. This gene was so named because larvae with the mutant 
phenotype are covered with hair and look somewhat like a hedgehog. 
The protein encoded by the hedgehog gene is a ligand and interacts 
with several receptors. Among other functions, it triggers the early 
steps in development of postsynaptic neurons. It is also involved in the 
differentiation of the photoreceptor cells of the eye. 

Conservation of the Homeobox 

In the early 1980s Drosophila geneticists started sequencing the DNA 
from the homeotic genes. Much to their surprise they found that all 
the homeotic genes contained a 180-basepair region. This region, 
named the homeobox after the genes in which it was first found, 
encodes for a sixty-amino-acid sequence that is very well conserved 
among the homeotic genes. Homeobox refers to the sequence of DNA; 
the amino acid sequence it encodes is called a homeodomain. 
Sequences at the homeobox usually differ by ten percent or less 
between pairs of homeotic genes in Drosophila. Homeoboxes are not 
restricted to homeotic genes and have been found in several other 
classes of developmentally important genes. The amino acids encoded 
by the homeobox region contain a motif called a helix-turn-loop, 
which is associated with binding to DNA sequences. Thus, a gene with 
a homeobox would be a prime candidate for a gene that encodes a 
transcription factor. 

More surprising than the discovery of homeoboxes themselves was 
their ubiquity. Soon after homeoboxes were found in Drosophila, 
William McGinnis and his colleagues went on a "fishing" expedition 
looking for homeotic genes. They looked in a variety of organisms 
using a method called "zoo blotting," a modified type of Southern 
blot. (See the Genetically Modified Organisms unit.) The process 
consisted of using gel electrophoresis to separate the DNA by size from 
each species they were interested in. The DNA was then heated to 
separate it into single-stranded DNA (ssDNA). Next the ssDNA was 
blotted and trapped on nitrocellulose filter paper. The researchers then 
added single-stranded homeobox DNA, which had been labeled with a 
radioactive isotope, to the filter paper. If the ssDNA on the blot was 
sufficiently similar to the labeled homeobox ssDNA, the two ssDNAs 
would hybridize on the filter paper. The filter paper would be 
radioactive wherever there was hybridization. To their surprise, 
McGinnis' group found homeobox sequences everywhere — in insects, 
crustaceans, vertebrates (including humans and mice), echinoderms, 
and mollusks. Almost all multicellular animals had genes with 
homeoboxes. Moreover, they all expressed these genes during 
development, often in very similar ways. 

Most invertebrates have a single cluster of homeotic genes. In 
Drosophila that cluster is broken in two. Vertebrates have four copies 
of the cluster, strongly suggesting that the cluster had been duplicated 
twice in vertebrates. 
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Conservation of the "Control Switch" 
Gene for Eyes 

Phylogenetic analysis has shown that eyes have independently evolved 
dozens of times in the history of life. (See the Evolution and 
Phylogenetics unit.) For example, there are striking differences 
between the eyes of insects and those of vertebrates. Vertebrates have 
a camera eye, consisting of a light-sensitive retina, a lens, and a series 
of muscles used for adjusting focus. In contrast, insects have compound 
eyes, consisting of numerous light-sensitive ommatidia. 

Biologists have learned about the genetics of the visual system in 
insects by studying mutations that affect eyes in Drosophila. Mutants 
of the eyeless gene in Drosophila have reduced eye size, with the 
extent of the reduction depending on the allele. The eyeless gene is 
normally expressed only in the tissues that become the eyes. Recall that 
genes in Drosophila are named for the phenotypes of their mutations 
and not their normal function. What is remarkable about eyeless is 
that its expression can induce eyes to grow where they ordinarily 
don't. Members of Walter Gehring's lab in Switzerland created 
transgenic flies, which could express the eyeless gene in various places 
in the developing fly. By expressing eyeless where it is normally not 
present (ectopic expression), they were able to produce flies with 
eyes on their antennae, legs, wings, and various other places. So, 
eyeless looks like a control switch gene for making eyes (Fig, 8). 

These same researchers also used databases to search for homologous 
genes of eyeless in mammals. (See the Genomics unit.) They found 
that the eyeless gene in Drosophila was strikingly similar (more than 
ninety percent sequence identity) to the Pax-6 gene in mammals. This 
Pax-6 gene is also called Smalleyes in mice (where mutants have small 
eyes) and Aniridia in humans (where mutants lead to deficient 
development of iris). 

Now here's the really fascinating part! Gehring's lab did the same 
ectopic expression experiment but with the mammalian homologue of 
eyeless. They produced flies with eyes on their antennae, legs, wings, 
and various other places. The eyes produced were the compound eyes 
of flies but the machinery for making these eyes could be turned on by 
mammalian eyeless protein. Despite the independent evolution of eye 
structure and over 550 million years of independent evolution, the 
"control switch" for eye development has been conserved. 

There are differences between the role eyeless plays in flies and 
mammals. Unlike in Drosophila, where eyeless is not required for 
viability, homozygotes for the deletion of eyeless are inviable in 
mammals. Furthermore, this gene is expressed in regions of the 
mammalian forebrain. This is strong evidence that eyeless has 
functions in addition to eye development. 



Figure 8. The head of a fruit fly, 
Drosophila melanogaster, viewed by 
scanning electron microscope (380x 
magnification). Targeted expression of 
the eyeless gene induced the formation 
of the eye facets on the antenna (to 
the lower-right of the eye), which are 
very similar to the facets of the normal 
eye. This dentifies eyeless as the master 
control gene of eye morphogenesis. 




Andreas Hefti and Georg Haider, HEAD OF A FRUIT FLY (1995). 
Courtesy of Science magazine, cover, 24 March 1995. 



Sonic Hedgehog 

Researchers discovered that vertebrates have a homologue to the 
Drosophila hedgehog gene. They named the vertebrate homologue 
"Sonic Hedgehog" after the video game character Sonic the 
Hedgehog. This gene, which encodes a ligand, has diverse functions, 
including limb development, patterning of the neural tubes (and 
hence the brain), and differentiation of regions in the gut. How does it 
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work? Cells of the developing notochord send out Sonic Hedgehog 
signals to the spinal cord. These cells respond to the signal and then 
differentiate into the ventral part of the spinal cord, which makes the 
motor neurons that permit muscular activity. Across mammals this 
gene is highly conserved; the mouse and human Sonic Hedgehog 
proteins are ninety-two percent identical at the amino acid level. 



A Brief Look at Plant Development 

Despite evolving multicellularity independently, plants and animals 
share some common features in their respective development. These 
shared features include homeotic mutations and the use of 
transcription factors. Research in plant development also started with 
model organisms — in this case, the mustard grass Arabidopsis 
thaliana. In Arabidopsis and other plants, the developing flower 
comprises four concentric whorls. The outermost whorl (Whorl 1) is 
fated to become the sepals, the outer floral leaves. It surrounds Whorl 
2, which is fated to become the petals, the white inner floral leaves. 
Whorl 3 is fated to become stamens, which contains the male organs. 
The innermost whorl (Whorl 4) is fated to become the carpels, which 
will form the ovary (Fig, 9). 

There are several homeotic mutations in flowers where different 
parts replace others. For example, in one class of mutations, sepals 
develop where petals should and carpals develop where stamens 
should. These mutations have been identified as defects of a family 
of genes that all encode a particular class of transcription factor, 
called the MADS box family. MADS box transcription factors occur in 
both plants and, to a lesser extent, in animals and contain a 
conserved fifty-eight amino acid sequence. 

Plants and animals differ in one important feature: the maintenance of 
totipotent cells. Cells, like the fertilized egg, which can make all of 
the cells of the organism, are said to be totipotent. In the process of 
animal development, the competence of the cells to become different 
cell types declines. But as cells become more differentiated, they 
continue to lose competence: In animals, pluripotent cells can 
produce most, but not all, types of cells, while multipotent cells can 
produce only a defined set of mature cells. Plants, however, have an 
apical meristem located at the tip of every root and stem that remains 
totipotent. They have other meristems that are also totipotent. 
Moreover, under the right conditions many differentiated plant cells 
are able to "de-differentiate to the embryonic state and subsequently 
redifferentiate to new cell types." 3 

Stem Cells 

Certainly some plant cells, like the totipotent meristems, are more 
versatile than animal cells. Recent discoveries, however, show that the 
difference in the retention of competence between animals and plant 
cells is not as great as once thought. During the late 1990s scientists 
found that adult humans have a reservoir of cells that retain some 
ability to become other cell types. 

Cells derived from fetal tissue have been used to generate so-called 
embryonic stem cells. In addition to the ethical dilemmas raised by the 
source of embryonic stem cells, there are practical limitations to the 
use of these cells for treating and curing diseases and regenerating 



Figure 9, The tissues that will become 
floral organs are arranged in concentric 
whorls of a developing flower. 
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tissues. Because the donors of these cells are immunologically different 
from the recipient, immunosuppression would have to be used as it is 
in organ transplantation. Because adult stem cells can be derived from 
the individual patient, concerns about compatibility of the cells would 
be obviated. 

But do adult stem cells have the same ability to differentiate as 
embryonic stem cells? Recent studies suggest that adult stem cells may 
be more versatile than had been previously thought. Catherine 
Verfaillie and her colleagues at the University Stem Cell Institute 
derived what they call Multipotent Adult Progenitor Cells (MAPC) from 
the bone marrow of adult mice. These cells appear to be able to 
differentiate into virtually all cell types of mouse when injected into 
mouse blastocysts. These MAPCs have also been injected into living 
adult mice and have differentiated into liver, lung, and intestine tissue. 

Coda 

The fact that the same principles and many of the same genes direct 
the development of such different and diverse animals has generated 
renewed interest and study of how developmental systems evolve. 
Given the striking similarity of genes used, how do the manifest 
differences across animals arise and evolve? This question will keep 
biologists busy for many years to come. 
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Glossary. 



Coordinate genes. Genes that 
set the coordinate system, the 
primary anterior-posterior and 
dorsal-ventral axes, of the early 
embryo. 

Developmental pathway. A 

sequence of genes that underlie a 
developmental process. 

Differentiation. The process by 
which cells specialize during 
development. 

Ectopic expression. Expression 
(transcription and translation) of a 
gene at a time or place where it is 
normally not expressed. 

Fate map. The diagrammatic 
representation of the cells in the 
embryo and the eventual type of 
tissue they will become in the 
adult. 

Homeobox. A 180-nucleotide 
section of DNA that codes for a 
specific class of DNA-binding 
proteins; first found in the 
homeotic genes of Drosophila 
melanogaster. 

Homeotic genes. Early 
developmental genes that specify 
segment identity. 

Ligand. A molecule that binds to 
a protein, usually at a specific 
binding site. 

Maternal effect. The condition 
where the phenotype is 
determined not by the individual's 
own genotype but by its mother's 
genotype. 

Microarray chip. Set of 

miniaturized biochemical 
reactions that occur in small spots 
on a microscope slide, which may 
be used to test DNA fragments, 
antibodies, or proteins. 



Motif. A short region in a protein 
sequence, which is conserved in 
many proteins. 

Multipotent. Cells that can 
produce a defined set of cell types. 

Pluripotent. Cells that can 
produce most, but not all, types of 
cells of the adult organism. 

Programmed cell death. 

Death of cells that is part of the 
normal process of development of 
an organism. (See also apoptosis 
from Cancer unit,) 

Regulatory element. 

Sequences near the coding regions 
of genes to which transcription 
factors can bind, thus influencing 
transcription. 

Southern blot. A technique for 
transferring DNA fragments 
separated by electrophoresis to a 
filter paper sheet. The fragments 
are then probed with a labeled, 
complementary nucleic acid to 
help determine their positions. 

ssDNA. Single-stranded DNA. A 
DNA molecule consisting of only 
one chain of alternating sugars 
(deoxyribose) and phosphates. 

Totipotent. Cells that can 
replicate to form any part of a 
complete organism. 

Transcription factor. A protein 
that influences transcription of 
another gene by binding to DNA. 

Transgenic organism. 

An organism that contains 
hereditary information from two 
different species of organisms. 
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Cell Biology and Cancer 



"l/l/e now understand a lot about cancer. We know that 
it results from a series of genetic changes having to 
do with cell division and growth control and genetic 
instability, mortality, the suicide mechanism in cells; 
the ability of the cells to migrate; the ability of the cells 
to attract to them a blood supply And so that's pretty 
profound that in a few sentences one can summarize 
a sophisticated, fundamental understanding of what 
a cancer is. " Leland Hartwell 
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Introduction 

A multicellular organism can thrive only when all its cells function in 
accordance with the rules that govern cell growth and reproduction. 
Why does a normal cell suddenly become a "rebel," breaking the rules, 
dividing recklessly, invading other tissues, usurping resources, and in 
some cases eventually killing the body in which it lives? 

To understand how and why cells rebel, we need to understand the 
normal functions of cell growth and reproduction. From the mid- 
nineteenth century on, research in cell biology, biochemistry, and 
molecular biology has provided astonishingly detailed information 
about the molecules and processes that allow cells to divide, grow, 
differentiate, and perform their essential functions. This basic 
knowledge of cell biology has also led to practical discoveries about 
the mechanisms of cancer. Specific molecules that control the 
progression of a cell through the cell cycle regulate cell growth. An 
understanding of normal cell cycle processes and how those processes 
go awry provides key information about the mechanisms that trigger 
cancer. Loss of control of the cell cycle is one of the critical steps in the 
development of cancer. 

Although cancer comprises at least 100 different diseases, all cancer 
cells share one important characteristic: they are abnormal cells in 
which the processes regulating normal cell division are disrupted. 
That is, cancer develops from changes that cause normal cells to 
acquire abnormal functions. These changes are often the result of 
inherited mutations or are induced by environmental factors such as 
UV light, X-rays, chemicals, tobacco products, and viruses. All evidence 
suggests that most cancers are not the result of one single event or 
factor. Rather, around four to seven events are usually required for a 
normal cell to evolve through a series of premalignant stages into an 
invasive cancer. Often many years elapse between the initial event and 



the development of cancer. The development of molecular biological 
techniques may help in the diagnosis of potential cancers in the early 
stages, long before tumors are visible. 

What Is Cancer? 

Cancer results from a series of molecular events that fundamentally 
alter the normal properties of cells. In cancer cells the normal control 
systems that prevent cell overgrowth and the invasion of other tissues 
are disabled. These altered cells divide and grow in the presence of 
signals that normally inhibit cell growth; therefore, they no longer 
require special signals to induce cell growth and division. As these cells 
grow they develop new characteristics, including changes in cell 
structure, decreased cell adhesion, and production of new enzymes. 
These heritable changes allow the cell and its progeny to divide and 
grow, even in the presence of normal cells that typically inhibit the 
growth of nearby cells. Such changes allow the cancer cells to spread 
and invade other tissues. 

The abnormalities in cancer cells usually result from mutations in 
protein-encoding genes that regulate cell division. Over time more 
genes become mutated. This is often because the genes that make the 
proteins that normally repair DNA damage are themselves not 
functioning normally because they are also mutated. Consequently, 
mutations begin to increase in the cell, causing further abnormalities 
in that cell and the daughter cells. Some of these mutated cells die, but 
other alterations may give the abnormal cell a selective advantage that 
allows it to multiply much more rapidly than the normal cells. This 
enhanced growth describes most cancer cells, which have gained 
functions repressed in the normal, healthy cells. As long as these cells 
remain in their original location, they are considered benign; if they 
become invasive, they are considered malignant. Cancer cells in 
malignant tumors can often metastasize, sending cancer cells to distant 
sites in the body where new tumors may form. 

Genetics of Cancer 

Only a small number of the approximately 35,000 genes in the human 
genome have been associated with cancer. (See the Genomics unit.) 
Alterations in the same gene often are associated with different forms 
of cancer. These malfunctioning genes can be broadly classified into 
three groups. The first group, called proto-oncogenes, produces 
protein products that normally enhance cell division or inhibit normal 
cell death. The mutated forms of these genes are called oncogenes. 
The second group, called tumor suppressors, makes proteins that 
normally prevent cell division or cause cell death. The third group 
contains DNA repair genes, which help prevent mutations that lead 
to cancer. 

Proto-oncogenes and tumor suppressor genes work much like the 
accelerator and brakes of a car, respectively. The normal speed of a car 
can be maintained by controlled use of both the accelerator and the 
brake. Similarly, controlled cell growth is maintained by regulation of 
proto-oncogenes, which accelerate growth, and tumor suppressor genes, 
which slow cell growth. Mutations that produce oncogenes accelerate 
growth while those that affect tumor suppressors prevent the normal 
inhibition of growth. In either case, uncontrolled cell growth occurs. 
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Oncogenes and Signal Transduction 

In normal cells, proto-oncogenes code for the proteins that send a 
signal to the nucleus to stimulate cell division. These signaling proteins 
act in a series of steps called signal transduction cascade or pathway 
(Fig. 1). (See the Genetics and Development unit.) This cascade includes 
a membrane receptor for the signal molecule, intermediary proteins 
that carry the signal through the cytoplasm, and transcription factors 
in the nucleus that activate the genes for cell division. In each step of 
the pathway, one factor or protein activates the next; however, some 
factors can activate more than one protein in the cell. Oncogenes are 
altered versions of the proto-oncogenes that code for these signaling 
molecules. The oncogenes activate the signaling cascade continuously, 
resulting in an increased production of factors that stimulate growth. 
For instance, MYC is a proto-oncogene that codes for a transcription 
factor. Mutations in MYC convert it into an oncogene associated with 
seventy percent of cancers. RAS is another oncogene that normally 
functions as an "on-off" switch in the signal cascade. Mutations in RAS 
cause the signaling pathway to remain "on," leading to uncontrolled 
cell growth. About thirty percent of tumors — including lung, colon, 
thyroid, and pancreatic carcinomas — have a mutation in RAS. 
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Figure 1. Signal transduction pathway. 

A signal (in this example, a growth 
factor) binds to a tyrosine kinase 
receptor on the outside of the cell. 
This activates the membrane protein 
(through the addition of phosphate 
groups), which in turn activates 
proteins, such as kinases, in the 
cytoplasm. Several other proteins may 
be involved in the cascade, ultimately 
activating one or more transcription 
factors. The activated transcription 
factors enter the nucleus where they 
stimulate the expression of the genes 
that are under the control of that 
factor. This is an example of the RAS 
pathway, which results in cell division. 
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The conversion of a proto-oncogene to an oncogene may occur by 
mutation of the proto-oncogene, by rearrangement of genes in the 
chromosome that moves the proto-oncogene to a new location, or by 
an increase in the number of copies of the normal proto-oncogene. 
Sometimes a virus inserts its DNA in or near the proto-oncogene, 
causing it to become an oncogene. The result of any of these events is 
an altered form of the gene, which contributes to cancer. Think again 
of the analogy of the accelerator: mutations that convert proto- 
oncogenes into oncogenes result in an accelerator stuck to the floor, 
producing uncontrolled cell growth. 

Most oncogenes are dominant mutations; a single copy of this gene is 
sufficient for expression of the growth trait. This is also a "gain of 
function" mutation because the cells with the mutant form of the 
protein have gained a new function not present in cells with the 
normal gene. If your car had two accelerators and one were stuck to 
the floor, the car would still go too fast, even if there were a second, 
perfectly functional accelerator. Similarly, one copy of an oncogene is 
sufficient to cause alterations in cell growth. The presence of an 
oncogene in a germ line cell (egg or sperm) results in an inherited 
predisposition for tumors in the offspring. However, a single oncogene 
is not usually sufficient to cause cancer, so inheritance of an oncogene 
does not necessarily result in cancer. 

Tumor Suppressor Genes 

The proteins made by tumor suppressor genes normally inhibit cell 
growth, preventing tumor formation. Mutations in these genes result 
in cells that no longer show normal inhibition of cell growth and 
division. The products of tumor suppressor genes may act at the cell 
membrane, in the cytoplasm, or in the nucleus. Mutations in these 
genes result in a loss of function (that is, the ability to inhibit cell 
growth) so they are usually recessive. This means that the trait is not 
expressed unless both copies of the normal gene are mutated. Using 
the analogy to a car, a mutation in a tumor suppressor gene acts much 
like a defective brake: if your car had two brakes and only one was 
defective, you could still stop the car. 

How is it that both genes can become mutated? In some cases, the first 
mutation is already present in a germ line cell (egg or sperm); thus, all 
the cells in the individual inherit it. Because the mutation is recessive, 
the trait is not expressed. Later a mutation occurs in the second copy of 
the gene in a somatic cell. In that cell both copies of the gene are 
mutated and the cell develops uncontrolled growth. An example of 
this is hereditary retinoblastoma, a serious cancer of the retina that 
occurs in early childhood. When one parent carries a mutation in one 
copy of the RB tumor suppressor gene, it is transmitted to offspring 
with a fifty percent probability. About ninety percent of the offspring 
who receive the one mutated RB gene from a parent also develop a 
mutation in the second copy of RB, usually very early in life. These 
individuals then develop retinoblastoma. Not all cases of 
retinoblastoma are hereditary: it can also occur by mutation of both 
copies of RB in the somatic cell of the individual. Because retinoblasts 
are rapidly dividing cells and there are thousands of them, there is a 
high incidence of a mutation in the second copy of RB in individuals 
who inherited one mutated copy. This disease afflicts only young 
children because only individuals younger than about eight years old 
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have retinoblasts. In adults, however, mutations in RB may lead to a 
predisposition to several other forms of cancer. 

Three other cancers associated with defects in tumor suppressor genes 
include familial adenomatous polyposis of the colon (FPC), which 
results from mutations to both copies of the APC gene; hereditary 
breast cancer, resulting from mutations to both copies of BRCA2; and 
hereditary breast and ovarian cancer, resulting from mutations to both 
copies of BRCA1. While these examples suggest that heredity is an 
important factor in cancer, the majority of cancers are sporadic with no 
indication of a hereditary component. Cancers involving tumor 
suppressor genes are often hereditary because a parent may provide a 
germ line mutation in one copy of the gene. This may lead to a higher 
frequency of loss of both genes in the individual who inherits the 
mutated copy than in the general population. However, mutations in 
both copies of a tumor suppressor gene can occur in a somatic cell, so 
these cancers are not always hereditary. Somatic mutations that lead to 
loss of function of one or both copies of a tumor suppressor gene may 
be caused by environmental factors, so even these familial cancers may 
have an environmental component. 



Table 1 
IMAR 

APC 

BCL2 

BLM 

BRCA 

BRCA 

HER2 

MYC 

p16 



Some Genes Associated with Cancer 



NAME 


FUNCTION 


EXAMPLES of Cancer/Diseases 


TYPE of Cancer Gene 


APC 


regulates transcription of target genes 


Familial Adenomatous Polyposis 


tumor suppressor 


BCL2 


involved in apoptosis; stimulates angiogenesis 


Leukemia; Lymphoma 


oncogene 


BLM 


DNA repair 


Bloom Syndrome 


DNA repair 


BRCA1 


may be involved in cell cycle control 


Breast, Ovarian, Prostatic, & Colonic Neoplasms 


tumor suppressor 


BRCA2 


DNA repair 


Breast & Pancreatic Neoplasms; Leukemia 


tumor suppressor 


HER2 


tyrosine kinase; growth factor receptor 


Breast, Ovarian Neoplasms 


oncogene 



involved in protein-protein interactions 
with various cellular factors 


Burkitt's Lymphoma 


oncogene 


cyclin-dependent kinase inhibitor 


Leukemia; Melanoma; Multiple Myeloma; 
Pancreatic Neoplasms 


tumor suppressor 



P 21 


cyclin-dependent kinase inhibitor 




tumor suppressor 


p53 


apoptosis; transcription factor 


Colorectal Neoplasms; Li-Fraumeni Syndrome 


tumor suppressor 


RAS 
RB 


GTP-binding protein; important in 
signal transduction cascade 


Pancreatic, Colorectal, Bladder Breast, Kidney, 
& Lung Neoplasms; Leukemia; Melanoma 


oncogene 


regulation of cell cycle 


Retinoblastoma 


tumor suppressor 


SIS 


growth factor 


Dermatofibrosarcoma; Meningioma; 
Skin Neoplasms 


oncogene 


XP 


DNA repair 


Xeroderma pigmentosum 


DNA repair 



DNA Repair Genes 

A third type of gene associated with cancer is the group involved in 
DNA repair and maintenance of chromosome structure. Environmental 
factors, such asionizing radiation, UV light, and chemicals, can damage 
DNA. Errors in DNA replication can also lead to mutations. Certain 
gene products repair damage to chromosomes, thereby minimizing 
mutations in the cell. When a DNA repair gene is mutated its product is 
no longer made, preventing DNA repair and allowing further 
mutations to accumulate in the cell. These mutations can increase the 



Cell Biology and Cancer 



Rediscovering Biology 



frequency of cancerous changes in a cell. A defect in a DNA repair 
gene called XP (Xeroderma pigmentosum) results in individuals who 
are very sensitive to UV light and have a thousand-fold increase in the 
incidence of all types of skin cancer. There are seven XP genes, whose 
products remove DNA damage caused by UV light and other 
carcinogens. Another example of a disease that is associated with loss 
of DNA repair is Bloom syndrome, an inherited disorder that leads to 
increased risk of cancer, lung disease, and diabetes. The mutated gene 
in Bloom syndrome, BLM, is required for maintaining the stable 
structure of chromosomes. Individuals with Bloom syndrome have a 
high frequency of chromosome breaks and interchanges, which can 
result in the activation of oncogenes. 



Cell Cycle 

Normal cells grow and divide in an orderly fashion, in accordance with 
the cell cycle. (Mutations in proto-oncogenes or in tumor suppressor 
genes allow a cancerous cell to grow and divide without the normal 
controls imposed by the cell cycle.) The major events in the cell cycle 
are described in Fig, 2. 



Quiescence 



G1/S checkpoint 



cell growth & 

accumulation 

of cyclins 



M/G1 checkpoint 



Mitosis 



G2/M checkpoint 




DNA synthesis 



Figure 2. The cell cycle is an ordered 
process of events that occurs in four 
stages. During the two gap phases, G1 
and G2, the cell is actively metabolizing 
but not dividing. In S (synthesis) phase, 
the chromosomes duplicate as a result 
of DNA replication. During the M 
(mitosis) phase, the chromosomes 
separate in the nucleus and the division 
of the cytoplasm (cytokinesis) occurs. 
There are checkpoints in the cycle at 
the end of G1 and G2 that can prevent 
the cell form entering the S or M 
phases of the cycle. Cells that are not in 
the process of dividing are in the GO 
stage, which includes most adult cells. 



Preparation for Mitosis 
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Several proteins control the timing of the events in the cell cycle, which 
is tightly regulated to ensure that cells divide only when necessary. The 
loss of this regulation is the hallmark of cancer. Major control switches 
of the cell cycle are cyclin-dependent kinases. Each cyclin- 
dependent kinase forms a complex with a particular cyclin, a protein 
that binds and activates the cyclin-dependent kinase. The kinase part 
of the complex is an enzyme that adds a phosphate to various proteins 
required for progression of a cell through the cycle. These added 
phosphates alter the structure of the protein and can activate or 
inactivate the protein, depending on its function. There are specific 
cyclin-dependent kinase/cyclin complexes at the entry points into the 
G1, S, and M phases of the cell cycle, as well as additional factors that 
help prepare the cell to enter S phase and M phase. 



Cell Biology and Cancer 



One important protein in the cell cycle is p53, a transcription factor 
(see the Genes and Development unit) that binds to DNA, activating 
transcription of a protein called p21. P21 blocks the activity of a cyclin- 
dependent kinase required for progression through G1. This block 
allows time for the cell to repair the DNA before it is replicated. If the 
DNA damage is so extensive that it cannot be repaired, p53 triggers 
the cell to commit suicide. The most common mutation leading to 
cancer is in the gene that makes p53. Li-Fraumeni syndrome, an 
inherited predisposition to multiple cancers, results from a germ line 
(egg or sperm) mutation in p53. Other proteins that stop the cell cycle 
by inhibiting cyclin dependent kinases are p16 and RB. All of these 
proteins, including p53, are tumor suppressors. 

Cancer cells do not stop dividing, so what stops a normal cell from 
dividing? In terms of cell division, normal cells differ from cancer cells 
in at least four ways. 

• Normal cells require external growth factors to divide. When 
synthesis of these growth factors is inhibited by normal cell 
regulation, the cells stop dividing. Cancer cells have lost the need 
for positive growth factors, so they divide whether or not these 
factors are present. Consequently, they do not behave as part of 
the tissue — they have become independent cells. 

• Normal cells show contact inhibition; that is, they respond to 
contact with other cells by ceasing cell division. Therefore, cells can 
divide to fill in a gap, but they stop dividing as soon as there are 
enough cells to fill the gap. This characteristic is lost in cancer cells, 
which continue to grow after they touch other cells, causing a 
large mass of cells to form. 

• Normal cells age and die, and are replaced in a controlled and 
orderly manner by new cells. Apoptosis is the normal, 
programmed death of cells. Normal cells can divide only about fifty 
times before they die. This is related to their ability to replicate 
DNA only a limited number of times. Each time the chromosome 
replicates, the ends (telomeres) shorten. In growing cells, the 
enzyme telomerase replaces these lost ends. Adult cells lack 
telomerase, limiting the number of times the cell can divide. 
However, telomerase is activated in cancer cells, allowing an 
unlimited number of cell divisions. 

• Normal cells cease to divide and die when there is DNA damage or 
when cell division is abnormal. Cancer cells continue to divide, 
even when there is a large amount of damage to DNA or when the 
cells are abnormal. These progeny cancer cells contain the 
abnormal DNA; so, as the cancer cells continue to divide they 
accumulate even more damaged DNA. 

What Causes Cancer? 

The prevailing model for cancer development is that mutations in 
genes for tumor suppressors and oncogenes lead to cancer. However, 
some scientists challenge this view as too simple, arguing that it fails to 
explain the genetic diversity among cells within a single tumor and 
does not adequately explain many chromosomal aberrations typical of 
cancer cells. An alternate model suggests that there are "master 
genes" controlling cell division. A mutation in a master gene leads to 
abnormal replication of chromosomes, causing whole sections of 
chromosomes to be missing or duplicated. This leads to a change in 
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gene dosage, so cells produce too little or too much of a specific 
protein. If the chromosomal aberrations affect the amount of one or 
more proteins controlling the cell cycle, such as growth factors or 
tumor suppressors, the result may be cancer. There is also strong 
evidence that the excessive addition of methyl groups to genes 
involved in the cell cycle, DNA repair, and apoptosis is characteristic of 
some cancers. There may be multiple mechanisms leading to the 
development of cancer. This further complicates the difficult task of 
determining what causes cancer. 

Tumor Biology 

Cancer cells behave as independent cells, growing without control to 
form tumors. Tumors grow in a series of steps. The first step is 
hyperplasia, meaning that there are too many cells resulting from 
uncontrolled cell division. These cells appear normal, but changes 
have occurred that result in some loss of control of growth. The 
second step is dysplasia, resulting from further growth, accompanied 
by abnormal changes to the cells. The third step requires additional 
changes, which result in cells that are even more abnormal and can 
now spread over a wider area of tissue. These cells begin to lose their 
original function; such cells are called anaplastic. At this stage, 
because the tumor is still contained within its original location (called 
in situ) and is not invasive, it is not considered malignant — it is 
potentially malignant. The last step occurs when the cells in the tumor 
metastasize, which means that they can invade surrounding tissue, 
including the bloodstream, and spread to other locations. This is the 
most serious type of tumor, but not all tumors progress to this point. 
Non-invasive tumors are said to be benign. 

The type of tumor that forms depends on the type of cell that was 
initially altered. There are five types of tumors. 

• Carcinomas result from altered epithelial cells, which cover the 
surface of our skin and internal organs. Most cancers are 
carcinomas. 

• Sarcomas result from changes in muscle, bone, fat, or 
connective tissue. 

• Leukemia results from malignant white blood cells. 

• Lymphoma is a cancer of the lymphatic system cells that derive 
from bone marrow. 

• Myelomas are cancers of specialized white blood cells that 
make antibodies. 

Angiogenesis 

Although tumor cells are no longer dependent on the control 
mechanisms that govern normal cells, they still require nutrients and 
oxygen in order to grow. All living tissues are amply supplied with 
capillary vessels, which bring nutrients and oxygen to every cell. As 
tumors enlarge, the cells in the center no longer receive nutrients from 
the normal blood vessels. To provide a blood supply for all the cells in 
the tumor, it must form new blood vessels to supply the cells in the 
center with nutrients and oxygen. In a process called angiogenesis, 
tumor cells make growth factors which induce formation of new 
capillary blood vessels. The cells of the blood vessels that divide to 
make new capillary vessels are inactive in normal tissue; however, 
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tumors make angiogenic factors, which activate these blood vessel cells 
to divide. Without the additional blood supplied by angiogenesis, 
tumors can grow no larger than about half a millimeter. 

Without a blood supply, tumor cells also cannot spread, or metastasize, 
to new tissues. Tumor cells can cross through the walls of the capillary 
blood vessel at a rate of about one million cells per day. However, not 
all cells in a tumor are angiogenic. Both angiogenic and non- 
angiogenic cells in a tumor cross into blood vessels and spread; 
however, non-angiogenic cells give rise to dormant tumors when they 
grow in other locations. In contrast, the angiogenic cells quickly 
establish themselves in new locations by growing and producing new 
blood vessels, resulting in rapid growth of the tumor. 

How do tumors begin to produce angiogenic factors? An oncogene 
called BCL2 has been shown to greatly increase the production of a 
potent stimulator of angiogenesis. It appears, then, that oncogenes in 
tumor cells may cause an increased expression of genes that make 
angiogenic factors. There are at least fifteen angiogenic factors and 
production of many of these is increased by a variety of oncogenes. 
Therefore, oncogenes in some tumor cells allow those cells to produce 
angiogenic factors. The progeny of these tumor cells will also produce 
angiogenic factors, so the population of angiogenic cells will increase 
as the size of the tumor increases. 

How important is angiogenesis in cancer? Dormant tumors are those 
that do not have blood vessels; they are generally less than half a 
millimeter in diameter. Several autopsy studies in which trauma 
victims were examined for such very small tumors revealed that thirty- 
nine percent of women aged forty to fifty have very small breast 
tumors, while forty-six percent of men aged sixty to seventy have very 
small prostate tumors. Amazingly, ninety-eight percent of people 
aged fifty to seventy have very small thyroid tumors. However, for 
those age groups in the general population, the incidence of these 
particular cancers is only one-tenth of a percent (thyroid) or one 
percent (breast or prostate cancer). The conclusion is that the 
incidence of dormant tumors is very high compared to the incidence 
of cancer. Therefore, angiogenesis is critical for the progression of 
dormant tumors into cancer. 

Viruses and Cancer 

Many viruses infect humans but only a few viruses are known to 
promote human cancer. These include both DNA viruses and 
retroviruses, a type of RNA virus. (See the HIV and AIDS unit.) Viruses 
associated with cancer include human papillomavirus (genital 
carcinomas), hepatitis B (liver carcinoma), Epstein-Barr virus (Burkitt's 
lymphoma and nasopharyngeal carcinoma), human T-cell leukemia 
virus (T-cell lymphoma); and, probably, a herpes virus called KSHV 
(Kaposi's sarcoma and some B cell lymphomas). The ability of 
retroviruses to promote cancer is associated with the presence of 
oncogenes in these viruses. These oncogenes are very similar to proto- 
oncogenes in animals. Retroviruses have acquired the proto-oncogene 
from infected animal cells. An example of this is the normal cellular 
c-SIS proto-oncogene, which makes a cell growth factor. The viral form 
of this gene is an oncogene called v-SIS. Cells infected with the virus 
that has v-SIS overproduce the growth factor, leading to high levels of 
cell growth and possible tumor cells. 
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Viruses can also contribute to cancer by inserting their DNA into a 
chromosome in a host cell. Insertion of the virus DNA directly into a 
proto-oncogene may mutate the gene into an oncogene, resulting in 
a tumor cell. Insertion of the virus DNA near a gene in the 
chromosome that regulates cell growth and division can increase 
transcription of that gene, also resulting in a tumor cell. Using a 
different mechanism, human papillomavirus makes proteins that bind 
to two tumor suppressors, p53 protein and RB protein, transforming 
these cells into tumor cells. Remember that these viruses contribute 
to cancer, they do not by themselves cause it. Cancer, as we have 
seen, requires several events. 

Environmental Factors 

Several environmental factors affect one's probability of acquiring 
cancer. These factors are considered carcinogenic agents when there is 
a consistent correlation between exposure to an agent and the 
occurrence of a specific type of cancer. Some of these carcinogenic 
agents include X-rays, UV light, viruses, tobacco products, pollutants, 
and many other chemicals. X-rays and other sources of radiation, such 
as radon, are carcinogens because they are potent mutagens. Marie 
Curie, who discovered radium, paving the way for radiation therapy 
for cancer, died of cancer herself as a result of radiation exposure in 
her research. Tobacco smoke contributes to as many as half of all 
cancer deaths in the U.S., including cancers of the lung, esophagus, 
bladder, and pancreas. UV light is associated with most skin cancers, 
including the deadliest form, melanoma. Many industrial chemicals are 
carcinogenic, including benzene, other organic solvents, and arsenic. 
Some cancers associated with environmental factors are preventable. 
Simply understanding the danger of carcinogens and avoiding them 
can usually minimize an individual's exposure to these agents. 

The effect of environmental factors is not independent of cancer 
genes. Sunlight alters tumor suppressor genes in skin cells; cigarette 
smoke causes changes in lung cells, making them more sensitive to 
carcinogenic compounds in smoke. These factors probably act directly 
or indirectly on the genes that are already known to be involved in 
cancer. Individual genetic differences also affect the susceptibility of an 
individual to the carcinogenic affects of environmental agents. About 
ten percent of the population has an alteration in a gene, causing 
them to produce excessive amounts of an enzyme that breaks down 
hydrocarbons present in smoke and various air pollutants. The excess 
enzyme reacts with these chemicals, turning them into carcinogens. 
These individuals are about twenty-five times more likely to develop 
cancer from hydrocarbons in the air than others are. 

Detecting and Diagnosing Cancer 

The most common techniques for detecting cancer are imaging 
techniques such as MRI, X-rays (such as mammograms), CT, and 
ultrasound, which can provide an image of a tumor. Endoscopy allows 
a physician to insert a lighted instrument to look for tumors in organs 
such as the stomach, colon, and lungs. Most of these techniques are 
used to detect visible tumors, which must then be removed by biopsy 
and examined microscopically by a pathologist. The pathologist looks 
for abnormalities in the cells in terms of their shape, size, and 
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structure, especially the nucleus. In addition, the pathologist looks at 
the borders of the tumor to see whether those cells are normal. Based 
on examination of the tumor cells, the pathologist determines whether 
the tumor is benign or malignant, and determines whether is in an 
early or late stage of development. Diagnosis may also include the 
removal and examination of lymph nodes to determine whether the 
cancer cells have spread. 

Tumor markers are proteins found more often in the blood of 
individuals with the tumor than in normal individuals. These are not 
ideal compounds for diagnosing of cancer for two reasons. First, 
individuals without cancer may have elevated levels of the marker, 
leading to false positives. Second, tumor markers are not sufficiently 
elevated in all individuals with cancer to allow their detection. This 
leads to false negatives. One of the most commonly used tumor 
markers is prostate-specific antigen (PSA). It is present in all adult 
males, but its level is increased after both benign and malignant 
changes in the prostate. Therefore, high levels of PSA indicate only 
that further tests are required to determine whether the condition is 
cancer. If prostate cancer is diagnosed, the levels of PSA can help to 
determine the effectiveness of treatment and detect recurrence. 
Another tumor marker is CA125, which is produced by a number of 
different cells, particularly ovarian cancer cells. It is used primarily to 
monitor the treatment efficacy of ovarian cancer. When the cancer is 
responding to treatment, CA125 levels fall. It is not used as a routine 
test for ovarian cancer because many common conditions that cause 
inflammation also increase the level of CA125, leading to a high 
incidence of false positives. 

The earlier a cancer is found the more effectively it can be treated; 
however, early stage cancers typically produce no symptoms. Scientists 
are developing molelcular techniques to detect very early cancer. Using 
techniques such as mass spectrometry, they are also developing specific 
blood tests to identify a pattern of new proteins in the blood of 
individuals with a particular type of cancer. (See the Proteomics unit.) In 
addition, scientists are developing DNA microarrays to identify genes 
expressed in particular types of cancer cells. (See the Genomics unit.) 

With the sequencing of the human genome and the mapping of 
single nucleotide polymorphisms (SNPs) (see the Genomics unit), 
it may be possible to diagnose particular cancers by identifying cells 
with known gene alterations. In 2002 scientists detected ovarian cancer 
by testing blood for the presence of DNA released by tumor cells. They 
looked for changes in certain alleles at eight SNPs that are 
characteristic of cancer. Using this technique, they successfully 
identified eighty-seven percent of patients known to have early-stage 
of ovarian cancer and ninety-five percent of those with late-stage 
ovarian cancer. The ability to determine which genetic alterations are 
associated with various cancers opens up the possibility of identifying 
cancerous cells while the cancer is in an early, treatable stage. 

Traditional Treatments 

Because cancer comprises many diseases, doctors use many different 
treatments. The course of treatment depends on the type of cancer, its 
location, and its state of advancement. Surgery, often the first 
treatment, is used to remove solid tumors. It may be the only 
treatment necessary for early stage cancers and benign tumors. 
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Radiation kills cancer cells with high-energy rays targeted directly to 
the tumor. It acts primarily by damaging DNA and preventing its 
replication; therefore, it preferentially kills cancer cells, which rapidly 
divide. It also kills some normal cells, particularly those that are 
dividing. Surgery and radiation treatment are often used together. 

Chemotherapy drugs are toxic compounds that target rapidly growing 
cells. Many of these drugs are designed to interfere with the synthesis 
of precursor molecules needed for DNA replication; they interfere with 
the ability of the cell to complete the S phase of the cell cycle. Other 
drugs cause extensive DNA damage, which stops replication. A class of 
drugs called spindle inhibitors stops cell replication early in mitosis. 
During mitosis, chromosome separation requires spindle fibers made of 
microtubules; spindle inhibitors stop the synthesis of microtubules. 
Because most adult cells don't divide often, they are less sensitive to 
these drugs than are cancer cells. Chemotherapy drugs also kill certain 
adult cells that divide more rapidly, such as those that line the 
gastrointestinal tract, bone marrow cells, and hair follicles. This causes 
some of the side effects of chemotherapy, including gastrointestinal 
distress, low white blood cell count, and hair loss. 

Newer Treatments 

Many of the factors that affect normal cell growth are hormones. 
Although cancer cells have lost some of the normal responses to 
growth factors, some cancer cells still require hormones for growth. 
Hormone therapy for cancer attempts to starve the cancer cells of 
these hormones. This is usually done with drugs that block the activity 
of the hormone, although some drugs can block synthesis of the 
hormone. For example, some breast cancer cells require estrogen for 
growth. Drugs that block the binding site for estrogen can slow the 
growth of these cancers. These drugs are called selective estrogen 
receptor modulators (SERMs) or anti-estrogens. Tamoxifen and 
Raloxifene are examples of this type of drug. A ten-year clinical trial of 
these two drugs with 20,000 women began in 1999 to determine their 
effectiveness in preventing breast cancer. Similarly, testosterone (an 
androgen hormone) stimulates some prostate cancer cells. Selective 
androgen receptor modulators (SARMs) are drugs that block the 
binding of testosterone to these cancer cells, inhibiting their growth 
and possibly preventing prostate cancer. 

Newer chemotherapeutic drugs target specific, active proteins or 
processes in cancer cell signal transduction pathways, such as 
receptors, growth factors, or kinases (see Fig, 1). Because the targets 
are cancer-specific proteins, the hope is that these drugs will be much 
less toxic to normal cells than conventional cancer drugs. 

The oncogene RAS is mutated in many types of cancer, particularly 
pancreatic cancer, which has a poor rate of survival for those afflicted. 
The RAS protein is only active after it is modified by the addition of a 
specific chemical group. Scientists are developing drugs to inhibit the 
action of the enzyme that adds the chemical group to the RAS protein, 
resulting in an inactive form of RAS. Early tests indicate that these 
drugs show promise for reducing tumors in cancer patients. 

A drug called Gleevec® inhibits cancer cell growth and causes cancer 
cells to undergo apoptosis, or programmed cell death. It binds to 
abnormal proteins in cancer cells, blocking their action in promoting 
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uncontrolled cell growth. Because it binds only to these abnormal 
proteins, Gleevec® does not show the high levels of toxicity of other 
chemotherapy drugs. Gleevec® was developed to treat a relatively rare 
cancer called chronic myeloid leukemia; however, it also appears to 
help other cancers. 

Chemotherapy may fail because the cancer cells become resistant to 
the therapeutic drugs. One of the characteristics of cancer cells is a 
high frequency of mutation. In the presence of toxic drugs, cancer cells 
that mutate and become resistant to the drug will survive and multiply 
in the presence of the drug, producing a tumor that is also resistant to 
the drug. To overcome this problem, combinations of chemotherapy 
drugs are given at the same time. This decreases the probability that a 
cell will develop resistance to several drugs at once; however, such 
multiple resistances do occur. Some drug-resistant cancer cells express a 
gene called MDR1 (multiple drug resistance). This gene encodes a 
membrane protein that can not only prevent some drugs from 
entering the cell, but can also expel drugs already in the cell. Some 
cancer cells make large amounts of this protein, allowing them to keep 
chemotherapy drugs outside the cell. 

Another promising target for cancer therapy is angiogenesis. Several 
drugs, including some naturally occurring compounds, have the ability 
to inhibit angiogenesis. Two compounds in this class are angiostatin 
and endostatin; both are derived from naturally occurring proteins. 
These drugs prevent angiogenesis by tumor cells, restricting tumor 
growth and preventing metastasis. One important advantage of 
angiogenesis inhibitors is that, because they do not target the cancer 
cells directly, there is less chance that the cancer cells will develop 
resistance to the drug. 

One contributing factor in cancer is the failure of the immune system 
to destroy cancer cells. Immunotherapy encompasses several 
techniques that use the immune system to attack cancer cells or treat 
the side effects of some types of cancer treatment. The least specific of 
these are the immunostimulants, such as interleukin 2 and alpha 
interferon, which enhance the normal immune response. 

A technique called chemoimmunotherapy attaches chemotherapy 
drugs to antibodies that are specific for cancer cells. The antibody 
then delivers the drug directly to cancer cells without harming 
normal cells, reducing the toxic side effects of chemotherapy. These 
molecules contain two parts: the cancer-cell-specific antibody and a 
drug that is toxic once it is taken into the cancer cell. A similar 
strategy, radioimmunotherapy, couples specific antibodies to 
radioactive atoms, thereby targeting the deadly radiation specifically 
to cancer cells. 

Another immunological approach uses antibodies that inactivate 
cancer-specific proteins, such as growth factors or tumor cell receptors, 
which are required by tumor cells. For example, many breast and 
ovarian cancer cells over-express a receptor protein called HER2. An 
antibody called Herceptin®, which binds HER2, inhibits tumor growth 
by preventing the binding of growth factors to these cells. 

Some cancers, particularly leukemia, are treated with very high doses 
of chemotherapy drugs and radiation intended to kill all the cancer 
cells. The side effect of this harsh treatment is destruction of the bone 
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marrow, which contains stem cells. Stem cells, immature cells that 
develop into blood cells, are essential. After treatment, the patient's 
bone marrow must be restored, either from bone marrow removed 
from the patient before drug therapy or from a compatible donor. 
Although the patient's own bone marrow is best, it can contain cancer 
cells that must be destroyed before it is returned to the patient. 

Table 2. Some Drugs Used in the Treatment of Cancer 
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CLASS 


MECHANISM 


selective estrogen receptor modulators 

(SERM) 

(Tamoxifen and Raloxifene) 


blocks the binding site for estrogen; 

can slow the growth of estrogen-stimulated cancers 


selective androgen receptor modulators 
(SARM) 


blocks the binding site for testosterone; 

can slow the growth of testosterone-stimulated cancers 


spindle inhibitors 


stops cell replication early in mitosis 


farnesyl transferase inhibitors 


blocks the addition of a farnesyl group to RAS, 
preventing its activation 


Gleevec® 


binds to abnormal proteins in cancer cells, 
blocking their action 


angiogenesis inhibitors 
(endostatin, angiostatin) 


prevent angiogenesis by tumor cells 


immunostimulants 
(interleukin 2, alpha interferon) 


enhance the normal immune response 


Herceptin® 


antibody that binds to HER2 receptor on tumor cells, 
preventing the binding of growth factors 



Preventing Cancer 

Cancer appears to result from a combination of genetic changes and 
environmental factors. A change in lifestyle that minimizes exposure to 
environmental carcinogens is one effective means of preventing 
cancer. Individuals who restrict their exposure to tobacco products, 
sunlight, and pollution can greatly decrease their risk of developing 
cancer. Many foods contain antioxidants and other nutrients that may 
help to prevent cancer. The National Cancer Institute recommends a 
diet with large amounts of colorful fruits and vegetables. These foods 
supply ample amounts of vitamin A, C, and E, as well as 
phytochemicals and other antioxidants that help to prevent cancer. 
There is strong evidence that a diet rich in vegetables and fruits will 
not only reduce the risk of cardiovascular disease, obesity, and 
diabetes, but will also protect against cancer. 

Vaccines also offer some promise for prevention of cancer. The first 
vaccine to prevent cancer was for hepatitis B, which is associated with 
liver cancer. An effective hepatitis B vaccine is available that can 
prevent both hepatitis and the cancer that may follow this infection. In 
2002, test results of a papillomavirus vaccine were reported. Human 
papillomavirus type 16 infects about twenty percent of adults. 
Although most papillomavirus infections do not cause cancer, some are 
associated with cervical cancer. A vaccine against this virus was 
administered to 1,200 young women in the United States. Within 
eighteen months, the vaccine produced high levels of antibodies to the 
virus, and prevented both papillomavirus infection and precancerous 
lesions in all the women. In the control group of about 1,200 women 
who did not receive the vaccine, forty-one infections and nine 
precancerous lesions were found. The vaccine can also prevent genital 
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warts caused by this virus strain. It appears that vaccines such as these 
may help in the fight to prevent cancers associated with viruses. 

Screening, Genetic Tests and Counseling 

Early diagnosis of cancer greatly increases survival; therefore, regular 
exams for cancer can help to prevent deaths from cancer. These include 
mammograms and Pap tests for women, prostate cancer tests for men, 
colonoscopy exams for colon cancer, and regular physical exams for 
other types of cancer. Individuals with a strong family history of cancer 
should consider genetic tests for cancer and cancer risk counseling. The 
focus of cancer risk counseling is the individual's personal risk of 
developing cancer and appropriate actions based on that risk. 

The discovery of the BRCA1 and BRCA2 genes associated with early 
development of breast cancer has allowed women with a family 
history of early breast cancer to be tested for mutations in these genes. 
Only five to ten percent of breast cancers show evidence of 
inheritance. Of these, forty-five percent are associated with a mutation 
in BRCA1 and thirty-five percent with BRCA2. The gene or genes for 
the remaining twenty percent are not yet known. If the BRCA1 and 
BRCA2 test results are negative, there is no evidence that the woman 
will have breast cancer because of these mutations. However, she may 
get breast cancer because of somatic mutations in these or other 
genes. If the BRCA1 or BRCA2 test is positive, other family members 
may be tested to determine whether the gene was inherited. If other 
family members are negative, then there is less chance of hereditary 
risk of this form of cancer, although the individual with the mutation 
does carry an increased risk of the disease. If the test is positive in 
other family members, there is an increased hereditary risk for breast 
cancer in that family. The absence of hereditary risk does not mean 
that there is no other risk for breast cancer. 

Decisions based on genetic tests can be very complicated. Individuals 
must be fully informed about the risks before they can make 
reasonable decisions. Genetic counselors are trained to help individuals 
make difficult decisions based on genetic tests. The cumulative risk of 
breast cancer to age seventy for a woman with a BRCA1 mutation is 
about fifty-seven to eighty-five percent depending on whether she is 
in a high-risk family. Some women find the fear of cancer so disruptive 
to their lives that they choose mastectomy to prevent cancer. (This is 
called prophylactic mastectomy.) Similarly, women with BRCA1 have a 
high lifetime risk of ovarian cancer, causing some of them to choose to 
have their ovaries removed. While these are difficult decisions, the 
availability of genetic information provides individuals with 
information that they can use to make such important medical 
decisions. A young woman with a strong family history of ovarian 
cancer might find by genetic testing that she does not have the BRCA1 
mutation and should not consider removal of her ovaries. 
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repair in cancer. 
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Glossary. 



Anaplastic. A term used to 
describe cancer cells that divide 
rapidly and have little or no 
resemblance to normal cells. 

Angiogenesis. Blood vessel 
formation. Tumor angiogenesis is 
the growth of blood vessels from 
surrounding tissue to a solid 
tumor. This is caused by the 
release of chemicals by the tumor. 

Apoptosis. A normal series of 
events in a cell that leads to its 
death. Also called "programmed 
cell death." 

CA125. A substance sometimes 
found in an increased amount in 
the blood, other body fluids, or 
tissues that may suggest the 
presence of some types of cancer. 

Chemoimmunotherapy. 

Chemotherapy combined with 
immunotherapy. Chemotherapy 
uses different drugs to kill or 
slow the growth of cancer cells; 
immunotherapy uses treatments 
to stimulate or restore the ability 
of the immune system to fight 
cancer. 

Cyclin-dependent kinases. 

Proteins that add a phosphate to a 
number of proteins that control 
steps in the cell cycle. 

Cyclins. Proteins that form 
complexes with cyclin-dependent 
kinases to control various steps in 
the cell cycle. 

Dysplasia. Cells that look 
abnormal under a microscope but 
are not cancerous. 

Hyperplasia. An abnormal 
increase in the number of cells in 
an organ or tissue. 

Kinase. An enzyme that catalyzes 
the transfer of a phosphate group 
from ATP to another molecule, 
often a protein. 



Oncogene. An altered form of a 
gene that normally directs cell 
growth. Oncogenes can promote 
or allow the uncontrolled growth 
of cancer. Alterations in a proto- 
oncogene, resulting in an 
oncogene, can be inherited or 
caused by an environmental 
exposure to carcinogens. (See 
proto-oncogene.) 

Phytochemicals. Chemicals 
found in plants. Many of these 
chemicals are thought to reduce a 
person's risk of getting cancer. 

Proto-oncogene. A gene that 
normally directs cell growth; if 
altered it may become an 
oncogene. 

Prostate-specific antigen (PSA). 

A substance produced by the 
prostate that may be found in an 
increased amount in the blood of 
men who have prostate cancer, 
benign prostatic hyperplasia, or 
infection or inflammation of the 
prostate. 

Radioimmunotherapy. 

Treatment with a radioactive 
substance that is linked to an 
antibody that will attach to the 
tumor when injected into the 
body. 

Signal transduction pathway. 

A series of events controlled by 
signal molecules that bind to 
membrane proteins. These, in 
turn, activate cytoplasmic 
proteins, which ultimately activate 
transcription factors. 

Single nucleotide 
polymorphism (SNP). Variations 
in the DNA sequence that occur 
when a single nucleotide (A, T, C, 
or G) in the genome sequence is 
changed. 



Telomerase. An enzyme that 
replaces the repeat sequences 
at the ends of chromosomes that 
are lost during chromosome 
replication. 

Telomeres. The ends of 
chromosomes containing repeat 
sequences; these ends are 
shortened each time the 
chromosome replicates. 

Transcription factor. A protein 
that influences transcription of 
another gene by binding to DNA. 

Tumor suppressor gene. Genes 
that can suppress or block the 
development of cancer. 
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Human Evolution 



"...we come from a long line of failures. We are apes, a 
group that almost went extinct fifteen million years ago 
in competition with the better-designed monkeys. We 
are primates, a group that almost went extinct forty- 
five million years ago in competition with the better- 
designed rodents. We are chordates, a phylum that 
survived in the Cambrian era 500 million years ago by 
the skin of its teeth in competition with the brilliantly 
successful arthropods. Our ecological success came 
against humbling odds. " M. Ridley 1 

We humans have always held a special fascination with our place in 
the evolutionary pageantry. Where did we come from? That we share 
close common ancestry with the apes is not in doubt. But what kind of 
an ape are we? How long ago did our lineage separate from the other 
apes? Are we still evolving? What can molecular genetics tell us about 
our history and our future? 

Concerning our place among the apes, Thomas Huxley (known as 
"Darwin's bulldog" because of his popularization of Darwin's theory of 
evolution) provided an early correct answer in 1863. Huxley placed 
humans with chimpanzees and gorillas (the great apes of Africa), and 
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separate from orangutans and gibbons. Numerous lines of evidence 
have since strongly supported this view. Morphologically, we share 
many derived traits with the African apes, including enlarged brow 
ridges, elongated skulls, shortened canine teeth, and enlarged 
mammary glands. We are also much more similar at the DNA level to 
the African apes than we are to any other species. 
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Figure 1. A gorilla (left) and a 
chimpanzee (right), our closest 
living relatives. 



Scientists have identified three species of African apes: the gorilla 
(Gorilla gorilla) and two species of chimpanzees. Pan troglodytes, the 
common chimpanzee, is the larger of the two. Pan paniscus, which had 
previously been called the "pygmy chimpanzee," is now named the 
"bonobo." In addition to size, the two chimpanzees differ in social 
structure and temperament; the bonobos appear to be more peaceful 
and egalitarian than the common chimpanzees. Both species of 
chimpanzees use tools; however, the types of tools vary in different 
populations of both species. 

By about 1980 a combination of crude molecular techniques — based 
on the divergence of certain proteins and the fossil record — allowed 
us to determine that humans, chimps, and gorillas last shared a 
common ancestor approximately five million to eight million years ago. 
Those methods, however, were not able to ascertain the order in which 
the three species split. Several questions remained. For instance, are 
chimpanzees and gorillas each other's closest relatives? Or is the closest 
relationship between humans and chimpanzees? Or is it between 
humans and gorillas? 

We could encapsulate the outdated view of evolution as progress up a 
ladder of changes. First the ability to walk upright (bipedalism) 
appeared. Soon after, the lineage leading to humans (the hominids) 
split off from the other African apes. Many fossils of the genus 
Australopithecus demonstrate that the earliest bipedal hominids did 
not substantially differ from chimpanzees in brain size. In the outdated 
view of human evolution, there was a slow steady increase in brain size 
as Australopithecus afarensis (better known as "Lucy") evolved into 
Homo habilis, and then into Homo erectus. Brain size continued to 
increase until the appearance of Homo neanderthalis (the 
Neanderthals), who looked much like us but had larger brow ridges. 

This older view of human evolution is not so much incorrect as it is 
incomplete and misleading. New fossil evidence demonstrates that the 
hominid lineage, our family tree, is more bush-like than ladder-like. 
Studies of these fossils show that several species of hominids coexisted 
for long periods of time. New molecular genetic evidence allows us to 
address which two of the three species — chimpanzee, gorilla, and 
human — represents the two closest relatives. Genetic data also can 
address the patterns of variation within and among human 
population. In addition, the molecular genetic data demonstrate how 
infectious disease has shaped genetic variation in humans. 

New Fossils 

During the 1990s archaeologists unearthed dozens of new fossil 
hominids. These have been particularly useful for illuminating the 
changes that took place as the human lineage split from the chimp 
lineage. One important find was in Ethiopia by Tim White (University 
of California-Berkeley), Gen Suwa (University of Tokyo), and others 
who found a fossil, which they determined to be 4.4 million years old. 
The fossil, Ardipithecus ramidus, probably represents a transitional 
form with respect to the evolution of bipedalism: while it may have 
been able to walk upright, it had a different posture than we do. It 
probably spent some time upright and some time walking like a chimp, 
on its knuckles. In other respects, it looked much like chimp, except for 
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Figure 2. The human "bush," 
as postulated from fossil finds of 
hominid species. 



subtle differences in teeth and skull. The first, clearly bipedal hominids 
— Australopithecus anamensis and Australopithecus afarensis — 
appeared about 4.1 million years ago, shortly after A ramidus. 

Other fossil discoveries illustrate the bushiness of the human lineage. 
As seen in the illustration, as many as four different apparent species 
often lived at the same time (Fig, 2). While there was a general trend 
toward increased brain size with time, species with considerably 
different brain sizes lived simultaneously. Questions remain about how 
different species replaced previous ones. Was it through warfare? Was 
it that the replacing species were better competitors? Perhaps it was 
simply a random event. We don't really know. 

Despite the inferences we can draw from these new fossil findings, the 
fossil record still has limitations; it is incomplete. How can one 
determine whether different fossils belong to the same species? 
Species determinations are based on the ability, or the perceived 
ability, of different groups to interbreed. In cases where it is infeasible 
or immoral to do experiments crossing the two groups, one can infer 
the capacity for the groups to interbreed based on genetic data. Yet, 
with few exceptions, scientists cannot extract DNA evidence from 
fossils; only morphological characters are available. How then can one 
make the inferences about the capacity to interbreed? For instance, 
sexual dimorphism may lead one to classify males and females of the 
same population as separate species. 

What Does DNA Tell Us About Our 
Position Among the Apes? 

The new genetic data have substantially contributed to our 
understanding of the relationship between our species and its closest 
relatives. Based on several independent lines of evidence, we can now 
say with confidence that humans are more related to chimpanzees than 
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Figure 3, A tree showing the 
evolution of the hominoids, including 
the great apes and humans. 




to gorillas (Fig, 3). While the two species of chimpanzees are each 
other's closest relatives, their next closest relative is H. sapiens and not 
G. gorilla. 

How do we know this? Evolutionary geneticists have been increasingly 
able to draw better and more robust inferences about the 
relationships among different organisms based on morphological and 
molecular genetic data, and new systematic methodology. These 
methods have also been used to determine our relationship among the 
apes. (See the Evolution and Phylogenetics unit.) In essence, groups of 
organisms (known as taxa) are placed into clades that are nested in 
larger clades based on shared ancestry. All of the taxa in a given clade 
are assumed to have a single common ancestor. 

The first DNA-based data used to determine the relationships of the 
African apes came from mitochondria. These intracellular organelles 
enable animals to use aerobic respiration and have DNA that evolves 
relatively quickly in mammals. Consequently, mitochondrial DNA 
(mtDNA) is useful in analyzing the relationships of closely related 
species and populations within species. Mitochondria are also abundant 
in cells and, thus, mtDNA was easier to obtain than nuclear DNA. 

New DNA amplification technologies developed during the 1990s, such 
as the polymerase chain reaction (PCR), makes obtaining sufficient 
quantities of DNA much easier. (See the Genetically Modified 
Organisms unit.) Yet, for historical reasons, most taxonomic studies 
that used DNA characters were first done with mtDNA. Based on the 
evidence from mtDNA sequences, chimpanzees and humans were 
determined to be each other's closest relatives. These studies further 
suggest that humans and chimpanzees separated almost five million 
years ago, and the human-chimp clade separated from gorillas almost 
eight million years ago. 
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Critics raised an important point about the inferences based on the 
mtDNA studies: it is based on only a single, independently evolving 
gene region. When one considers very closely related groups of 
species, the constructed phylogenetic tree based on data from one 
gene may be different than one constructed from a different gene. 
Either one or both gene trees may not accurately reflect the true 
evolutionary history of the species. This phenomenon occurs because 
of genetic variation (polymorphism) in the ancestral species. 
Ancestral polymorphism can segregate differently in the different 
descendant species; that is, in one of the different descendant species, 
one of the variants may become fixed and in another descendant 
species a different variant may be fixed. Either natural selection or 
random genetic drift can cause this phenomenon. In either case, there 
is possibility that the history of the gene region may not reflect the 
history of the species. In other words, suppose that chimps really did 
split first from the lineage containing humans and chimps. It would still 
be possible that the phylogenetic tree based on a single gene may 
have gorillas splitting off from humans and chimps, or humans 
splitting from chimps and gorillas. 

In the case of determining the relationships among the African apes, 
the solution to this challenge was simply the collection of more data 
from more genes. Mary-Ellen Ruvolo analyzed data sets from fourteen 
independent gene regions. In eleven of the cases, humans and chimps 
are each other's closest relatives (sister taxa). In two cases, gorillas 
and chimpanzees are sister taxa, and in one case humans and gorillas 
are sister taxa. Statistical tests show that these results are highly 
unlikely to arise unless humans and chimpanzees are indeed each 
other's closest relatives. Subsequent analyses with even more genes 
have corroborated the conclusion reached by Ruvolo and the earlier 
mitochondrial DNA studies: we are closest to chimps. 

Variation Within and Among 
Human Populations 

At the DNA level, humans are both very similar to and very different 
from one another. On average, pairs of individual humans share 99.9% 
DNA sequence identity. Due to the sheer size of our genomes, 
however, we possess numerous differences from one another. The 
human genome consists of just over three billion nucleotides; that 
0.1 % of difference represents three million variants between the 
average pair. The vast majority of these variants have no functional 
significance. However, even if one in a thousand did, that would still 
mean that we would each differ at thousands of functionally 
important sites. 

How does this variation compare with that of other species? Humans 
actually have less genetic variation than do their closest relatives. For 
instance, the average difference between two randomly selected 
chimpanzees is roughly four times greater than between two humans. 
This is, at first glance, surprising. Based on population genetic theory, 
levels of genetic variation within species should correlate positively 
with population size. This predicted correlation comes about because 
the strength of random genetic drift — which results in the loss of 
genetic variation — increases at lower population sizes. Yet, the 
human population numbers in the billions, and the population sizes of 
chimpanzees and gorillas is fewer than a hundred thousand. 
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What could explain that discrepancy? The strength of genetic drift is 
dependent not on the current census population size but on the 
historical population sizes. The relatively low levels of genetic variation 
in humans can be explained by a severe, but short-lasting, population 
bottleneck, where the population of our species was likely reduced to 
a few thousand. It could also be explained by a more moderate, 
sustained bottleneck. During this bottleneck the population was 
possibly in the tens to hundreds of thousands for a more considerable 
time. Alternately, natural selection could also either increase or 
decrease the extent of variation in one of the species. Yet, because it is 
unlikely that natural selection would act in the same way on multiple 
regions of the genome, the difference in the extent of genetic 
variation between humans and chimpanzees is more likely a 
consequence of historical demography. 

How is this variation partitioned according to known racial groups? 
During the 1970s the then state-of-the-art technique of electrophoresis 
of protein variants showed that around eighty to ninety percent of 
human genetic variation was within ethnic populations, five to ten 
percent was among ethnic populations within the major racial groups, 
and only about five to ten percent was among the major racial groups. 
In other words, "if everybody on earth became extinct except for the 
Kikiyu of East Africa, about eighty-five percent of all human variation 
would still be present in the reconstituted species" 2 . More recent 
analyses of DNA sequence data strongly confirm the results of earlier 
protein electrophoresis studies. In both the protein electrophoresis and 
the DNA sequence studies, the differences between racial groups are 
generally ones of frequencies and not kind. The situation of "fixed 
differences" — in which all individuals in one group have variant A and 
all individuals of another group have variant B — is extremely rare in 
humans. Instead, groups vary by having different frequencies of genetic 
variants. There are cases of "private alleles," however, where genetic 
variants are found in low to intermediate frequencies in some 
populations but are virtually absent from others. 

Out of Africa? 

As with determining the relationships of the apes, the first DNA-based 
studies of the relationships of human populations also used 
mitochondrial DNA. In mammals, mitochondria have an interesting 
inheritance pattern: they are transmitted nearly exclusively along 
maternal lines. Although males have mitochondria, they do not 
transmit them to their offspring. Thus, all of your mitochondria came 
from your maternal grandmother and, by extension, your maternal- 
maternal great-grandmother. In 1987 Rebecca Cann, Mark Stoneking, 
and Alan Wilson (then at University of California-Berkeley) published a 
controversial and provocative paper in Nature, stating that they had 
located the common ancestor of all mitochondrial variants — the so- 
called Mitochondrial Eve (Fig, 4). They placed her in Africa 
approximately 200,000 years ago; subsequent studies have found 
similar results. Fifteen years after that first paper the results remain a 
source of interest and controversy. 

Why are the Mitochondrial Eve studies a continual source of 
controversy within the human evolutionary genetics community? That 
there is a common ancestor of mitochondrial DNA sequences is not a 
surprise. In fact, it is a consequence of Mendelian genetics: genes taken 
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from any sample within a population will share a common ancestor. 
Take pairs of gene copies in the same population. Some of them will 
share the same ancestor from one generation ago; they are from 
siblings of the same parents. Some pairs of gene copies will trace a 
common ancestor two generations back. Some pairs will share 
common ancestry even further back. However, eventually, all copies 
will share a common ancestor. What is of interest is how long it takes 
all gene copies to coalesce to that ancestor. 

What the debate focuses on is the timing and the location of the 
Mitochondrial Eve. The initial studies showed that there was one clade 
consisting only of African individuals, and one with African and other 
individuals. Hence, we can infer that the common ancestor lived in 
Africa. Numerous researchers challenged the methodology of the 
original study. For instance, the original study used African-American 
individuals instead of individuals from Africa. Most of the subsequent 
studies, using more data — including data from individuals from 
several African tribes — and better methodology seem to confirm that 
Africa is the location of the common ancestor. 




Figure 4, The diagram illustrates 
how one line of mitochondrial DNA 
came to be carried by all living 
humans, passed down to us through 
the "Mitochondrial Eve." 



To determine the age of the Mitochondrial Eve, biologists need to first 
make assumptions about the way evolution proceeds. The usual 
assumption is that changes in the DNA occur roughly in a clock-like 
fashion — that there is a so-called molecular clock. The molecular 
clock assumes that groups separated by twenty nucleotide changes 
have common ancestors that are roughly twice as old those separated 
by ten nucleotide changes. No one believes DNA evolution proceeds in 
a perfect clock-like manner. What is debated is the extent to which the 
clock assumption can provide an estimate about divergence times. The 
usefulness and the accuracy of molecular clocks have been 
controversial ever since Zuckerkandel and Pauling proposed them in 
the 1960s. Yet, most evolutionary biologists agree that the molecular 
clock concept has at least some validity. 
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Inferring dates based on a molecular clock also requires that one 
calibrate the clock. How quickly do the changes occur in the lineage(s) 
of interest? Different molecular clocks based on different regions of 
the genome or different types of organisms don't all tick at the same 
rate. To calibrate a molecular clock, researchers usually use a lineage 
split for which they have at least some degree of confidence about 
when it occurred. They then divide the amount of genetic divergence 
by the time when the groups last shared a common ancestor. In the 
case of mtDNA, the calibration was set by the human-chimp split of six 
million years. Because chimps and humans differ by about twelve 
percent of nucleotides in mtDNA, the rate of change for the hominid 
lineage for mtDNA is about two percent per million years. The average 
total divergence from contemporary sequences to the inferred 
sequence of the mtEve is about 0.4% and, thus, the divergence time is 
about 200,000 years. The confidence limits, however, of this estimate 
are rather large. Christopher Wills once concluded that it is possible 
that the upper-end of mtEve's age may be as much as 800,000 years; 
new data places his latest estimate at 400,000 years. 3 

Different genes will often have different evolutionary histories. One 
should not expect the male equivalent ("the Y chromosome Adam") to 
have lived at the same place and the same time as the Mitochondrial 
Eve. Owing in part to having a lower mutation rate, the human Y 
chromosome generally has less variation than the mitochondria, which 
makes analysis more difficult. Nonetheless, recent studies suggest that 
the last common ancestor of all existing human Y chromosomes also 
lived in Africa — but more recently than Mitochondrial Eve. 

Largely from the Mitochondrial Eve studies, one model — the out of 
Africa hypothesis — gained favor among anthropologists and human 
evolutionary geneticists. This hypothesis, which is sometimes called the 
"replacement hypothesis," postulates that modern Homo sapiens 
spread out of Africa, into Europe and Asia, and replaced archaic Homo 
sapiens living in those regions (Fig, 5). In contrast, Milford Wolpoff 
and others have proposed the multiregional hypothesis. They argue 
that the archaic Homo sapiens populations in the different regions 
(Europe, Asia, and Africa) all evolved together into modern Homo 
sapiens. While genetic changes would first occur in one locality, gene 
flow would spread those changes into the other localities. 

The out of Africa and multiregional hypotheses make several distinct 
predictions. One would predict that under the out of Africa hypothesis, 
Africa would be the origin of the common ancestor of variants for 
most of the independent data sets (different genes) tested. The 
multiregional hypothesis would predict a random pattern. Under the 
out of Africa model, the divergence time between the African and the 
non-African populations would have an upper-limit of about 200,000 
years. In contrast, the multiregional hypothesis would predict a 
divergence time of approximately one million years. One caveat is that 
the apparent age of the divergence could be reduced by the gene flow 
among the populations. Another caveat is that selection can also alter 
the apparent divergence times. The out of Africa hypothesis also 
predicts that there will be more genetic diversity within the African 
population than within the other populations. 

As of 2003 the evidence seems to favor the out of Africa model though 
some intermediate positions cannot be ruled out. In nearly all of the 
studies more genetic diversity is seen in the African populations than in 



Figure 5, Top: The "out of Africa," or 
11 replacement, " hypothesis suggests all 
living humans evolved from a group 
that originated in Africa. Bottom: The 
"multiregional" hypothesis suggests 
several groups evolved in parallel to 
form today's population of humans. 
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others. In addition, the divergence times appear more consistent with 
the out of Africa than the multiregional hypothesis. As we obtain more 
and more sequences from different regions from the genome, this 
debate should become resolved. 



Figure 6, Top: An artist's rendering of 
a Neanderthal man. Bottom: An adult 
human (right) and Neanderthal skeleton 
(left) side by side. 



Neanderthals in Our Gene Pool? 

Have Neanderthals contributed to our gene pool? This question is 
related to, but is distinct from, the "out of Africa" debate. If 
Neanderthals had made a substantial contribution to the gene pool of 
contemporary humans, replacement models like out of Africa would 
be severely challenged. On the other hand, while the lack of 
Neanderthal contribution to the contemporary human gene pool 
would be consistent with the out of Africa model, that particular result 
alone would not disprove the multiregional hypothesis. It is also 
possible that there was substantial exchange of genes across many 
different human populations but that the Neanderthal population was 
not involved. 

How can we tell whether Neanderthals contributed to the 
contemporary gene pool? You can't get DNA from fossil humans. Or can 
you? Data from fragments of DNA collected from different Neanderthal 
fossils have led to the conclusion that Neanderthals probably did not 
contribute to the contemporary gene pool. In 2000 Igor Ovchinnikov 
and his colleagues were able to obtain small fragments of mtDNA from 
a 29,000-year-old Neanderthal fossil found in the Caucasus Mountains. 
They compared the mitochondrial sequences from their fossil to mtDNA 
collected from a previously collected Neanderthal fossil from Germany. 
Ovchinnikov and his collegues concluded "Phylogenetic analysis places 
the two Neanderthals from the Caucasus and western Germany 
together in a clade that is distinct from modern humans, suggesting 
that their mtDNA types have not contributed to the modern human 
mtDNA pool." 4 

Human Genetic Variation and Disease 

Disease has continued to have a strong impact on human mortality and 
reproduction. One would expect there to be genetic variation for the 
ability to resist disease. Indeed, there is such variation. Moreover, 
biologists have been increasingly able to correlate variation at specific 
genetic loci, and susceptibility to or severity of various diseases. For 
example, scientists are combining genetic and genealogical data to 
locate genes that affect disease tendency in Icelanders. 

Genetic variation for disease resistance and natural selection associated 
with disease has shaped the evolution of our species. Below, we discuss 
the impact of two infectious diseases: malaria and HIV. We conclude 
with a discussion of the genetics of asthma propensity — an illustration 
of the complex interplay of genetics and environmental effects. 

Malaria, Sickle Cell Anemia, and 
Balancing Selection 

Sickle cell anemia affects approximately 70,000 Americans, almost 
exclusively those with African ancestry. The lifespan of an individual 
with sickle cell anemia is currently approximately 40 years in the 
United States. Before the advent of modern medicine, individuals with 
the disease usually died before they could have offspring. 




Zdenik Burian, Neanderthal (1960). Courtesy of 
the Moravian Museum. 




Courtesy of the American Museum of Natural History 



Human Evolution 



The disease is caused by a change in a single amino acid difference in 
the beta chain of hemoglobin. Individuals with two copies of the sickle 
form of the gene have sickle cell anemia. Heterozygotes — individuals 
with one normal and one mutant copy of the gene — appear normal 
and do not manifest the disease except under very stressful conditions; 
however, they are carriers. If two carriers have a child, the child has a 
twenty-five percent probability of receiving two copies of the sickle 
form and having the anemia. Approximately ten percent of African 
Americans are carriers. In Africa itself the frequencies of the disease 
and carriers are even higher. 

If sickle cell anemia is so deadly, why are so many people heterozygous 
carriers of the disease? Moreover, why does the disease afflict 
predominantly one racial group? Surpisingly, the answer has to do 
with malaria. Heterozygote sickle cell carriers are much more resistant 
to malaria than those with just normal hemoglobin. Because 
heterozygotes have the best of both worlds (no sickle cell anemia and 
higher malaria resistance) and malaria is extremely prevalent in Africa, 
the sickle allele can be maintained in balance with the normal allele. 
Note that in the United States, where malaria is rare, the carriers 
possess no such advantage and may even have a small selective 
disadvantage. Therefore, due to the strong selection acting against 
those with the anemia, the frequency of sickle cell anemia should 
slowly decline in the United States. That the frequency of the sickle cell 
allele is higher in African populations than in African-Americans is due 
to both this selection and the genetic mixing between whites and 
blacks in the United States. 

This situation, where selection actively maintains two or more alleles at 
a locus, is called balancing selection. Balancing selection can arise by 
the heterozygotes having a selective advantage, as in the case of sickle 
cell anemia. It can also arise in cases where rare alleles have a selective 
advantage. In extreme cases, balancing selection can maintain alleles in 
populations long enough for speciation to occur. In such cases, one 
species may have alleles that are more similar to those of the other 
species than they are to other alleles of the same species. One case of 
this phenomenon occurs at loci at the major histocompatibility 
complex (MHC) wherein some human alleles are much more closely 
related to some chimpanzee alleles than they are to other human 
alleles (Fig, 7). MHC — also called the human leukocyte antigen (HLA) 
loci when referring to it in humans — encodes proteins that are used 
to recognize foreign invaders by cells of the immune system. Chimp- 
like alleles have been maintained in the human population not 
because they are chimp-like, but because either having rare alleles or 
having two different alleles has provided a selective advantage. This 
balancing selection is so powerful that alleles are maintained that 
predate the human/chimp split. 
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Figure 7, For nearly all genes, human 
alleles cluster together and chimp alleles 
cluster together (left). In the case of the 
major histocompatibility complex (MHC), 
human alleles are often more closely 
related to chimp alleles and vice-versa. 
This occurs due to balancing selection 
maintaining variation at the MHC (right). 

Illustration — Bergmann Graphics 



Resistance to HIV 

Despite the lethality of HIV/AIDS, susceptibility to HIV infection and 
progression to AIDS is rather variable. There are individuals who have 
been exposed to HIV multiple times but who either remain uninfected 
or if they are infected, progress more slowly to full-blown AIDS. Recent 
studies have shown that some of the variation in HIV resistance has a 
genetic component. 

HIV operates by subverting the immune system; therefore, it is logical 
that differences in the immune system may play a role in the genetic 
variation of resistance to HIV. Indeed, some HIV-resistant individuals 
possess different chemokine receptors than HIV-susceptible 
individuals. What's a chemokine receptor? First, let's discuss 
chemokines. 

Chemokines are molecular signals released by cells of the immune 
system that stimulate white blood cells to move to inflamed tissues. 
They are metaphoric "cries for help." The chemokines bind to receptors 
located on the white blood cells. Macrophages — those white blood 
cells that engulf foreign particles and are an early stage of defense — 
possess the chemokine receptor that is encoded by the gene CCR5. By 
subverting the normal function of this chemokine receptor, HIV is able 
to gain entry into macrophages. (See the HIV and AIDS unit.) 

Individuals that have lower expressions of this protein due to variants 
of the CCR5 gene have an increased resistance to HIV; their 
macrophages are metaphorically more cautious about the signals they 
respond to. The most obvious case of a "more cautious" CCR5 variant is 
the allele that has a deletion of thirty-two nucleotides. Individuals who 
are heterozygous for this variant, CCR5-delta32, have substantially 
increased resistance to HIV infection; if infected, progress to full-blown 
AIDS is much slower than normal. Individuals that are homozygous for 
CCR5-delta32 are virtually completely resistant to HIV. In European 
populations about twenty percent of individuals are heterozygotes, 
and one percent are homozygotes in some populations. In contrast, 
the allele is rare in the Asian populations and virtually absent in the 
African populations. 

Why is this deletion variant present in some populations in such high 
frequencies? HIV is, at most, a couple centuries old and, more likely, 
less than a hundred years old. That isn't sufficient time for natural 
selection to increase the frequency of a rare allele, such as is observed 
in the European populations. Furthermore, the selection pressures 
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caused by HIV should be much higher in Africa than in Europe. It is also 
probable that the decreased receptivity to chemokines would be 
somewhat costly. Some biologists have suggested that the deletion 
allele could be a vestige of plague resistance. It may have led to 
increased survival during the Black Plague of the fourteenth century in 
Europe, and has had an unintended — but welcome — consequence of 
HIV resistance. The increased frequency of the variant in Europe would 
be consistent with that scenario. 

The environment and, in particular, disease, has continued to exert 
strong pressures on human populations. Generally, we are unable to 
directly observe changes in species because these changes occur in time 
scales that exceed human lifespans. Yet, we may be able to detect 
small changes in allele frequencies that have occurred in populations 
due to epidemics. 

The Genetics of Asthma, 
a Complex Disease 

Asthma, which can be considered a consequence of an overly sensitive 
immune system, is a substantial and growing health problem. As of the 
year 2000 it was the eighth-most prevalent chronic disease in the 
United States and affected about fifteen million Americans. That's an 
increase of more than fifty percent between 1982 and 1996. While this 
dramatic increase underscores a clear environmental component 
asthma is also a genetic disease. The likelihood for getting asthma 
varies widely and has been known to run in families. Identical 
(monozygous) twins have a higher concordance of their asthma 
susceptibility than do fraternal (dizygous) twins. 

The genetics of asthma, like the genetics of most prevalent diseases, is 
complex. There is no single gene for asthma, coronary heart disease, or 
most forms of cancer. Moreover, the severity of asthma-related 
symptoms follows a continuum. During the 1990s geneticists have been 
increasingly able to map complex, continuous conditions to regions of 
the genome. About a dozen different regions of the genome have 
been identified for having effects on asthma susceptibility. 
Interestingly, asthma propensity maps to different genetic regions 
depending on which ethnic group(s) are studied. As summarized by 
Matt Ridley, "the gene that most defined susceptibility to asthma in 
blacks was not the same that most defined susceptibility in whites, 
which was different again from the gene that most defined 
susceptibility in Hispanics." 7 

Why could this be? Michael Wade presents a plausible explanation for 
this failure to replicate the results in different populations: that the 
genetic background is different across the different populations. 5 The 
different populations could have somewhat different allele frequencies 
of genes that act as modifiers of the genes that have a large effect on 
asthma propensity. This would be consistent with we know about 
genetic variation in human populations: differences among 
populations are usually ones of frequency, not of kind. Because of 
different frequencies of modifier alleles across the populations, a 
particular gene may explain more of the variation in asthma sensitivity. 
Determining whether this is the explanation for the different results 
obtained for asthma susceptibility will require first isolating the 
modifier genes and then testing whether their frequencies vary in 
different populations. 
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Our History, Our Future 

The common ancestor that we shared with chimpanzees about six 
million years ago was much more like modern chimps than us. In our 
lineage, the hominids, so many changes occurred: bipedalism, 
substantially larger brains, tool use, language, and so on. The genetic 
bases of these important transitional changes remain murky at best. 
What genetics has shown us is that we are one species, somewhat 
lacking in genetic variation, and having only slight differences 
among different populations. Genetic studies have also shown that 
disease and other factors continue to substantially affect our 
evolutionary trajectory. 
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Glossary. 



Balancing selection. Selection 
actively maintains more than one 
variant of a gene in a population. 

Chemokine. A chemical signal 
that attracts white blood cells to 
infected parts of the body. 

Chemokine receptor. Protein 
associated with the membranes of 
white blood cells that chemokines 
can attach to. 

Clade. An organizational term 
used in cladistics to describe a 
group of related organisms being 
compared. 

Gene tree. A representation of 
the evolutionary history of a 
particular gene or DNA sequence. 

Hominids. All members of the 
lineage that includes Homo 
sapiens and all extinct species 
since it split from the common 
ancestor of humans and apes. 

Mitochondrial Eve. The woman 
who possessed the most recent 
common ancestor of all 
mitochondrial DNA variants 
currently in the human population. 

Molecular clock. The hypothesis 
that, within lineages, DNA 
sequences of a particular gene will 
evolve in a roughly clock-like 
manner; that is, approximately as 
a linear function of time. 



Multiregional hypothesis. 

The hypothesis that gene flow 
between different regional 
populations of archaic Homo 
sapiens allowed them to all 
evolved together into modern 
Homo sapiens; contrasted with the 
out of Africa hypothesis. 

"Out of Africa" (Replacement 
hypothesis). The hypothesis that 
postulates that modern Homo 
sapiens spread out of Africa into 
Europe and Asia and replaced 
archaic Homo sapiens living in 
those regions; contrasted with the 
out of africa hypothesis. 

Polymorphism. The presence of 
two or more variants of a genetic 
trait in a population. 

Species tree. A representation of 
the evolutionary relationships of 
different species. 

Sister taxa. The most closely 
related groups of organisms in a 
phylogeny. 

Taxa. Groups or representatives of 
related organisms that are being 
compared; they can vary in 
hierarchical level (such as genus, 
family, order, and so on). 
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"The human nervous system is probably the most intricately 
organized aggregate of matter on Earth. A single cubic 
centimeter of the human brain may contain well over 
50 million nerve cells, each of which may communicate 
with thousands of other neurons in information-processing 
networks that make the most elaborate computer look 
primitive. These neural pathways control our every 
perception and movement and enable us to learn, think, 
and be conscious of ourselves and our surroundings." 
Campbell and Reece 1 

The most striking differences between humans and other animals are 
in the size and the complexity of our brains. With our big brains we 
have acquired a rich culture, which far exceeds that of any other 
species in scope and complexity. We have developed science to 
understand how and why an immensity of things and processes work, 
including those of our own brain. At the start of the twenty-first 
century neuroscientists are increasingly able to explain the functions of 
brain in molecular terms. 

To understand how the brain works we first must consider what the 
brain does. This can be broken down into three basic functions: (1) 
take in sensory information, (2) process information between neurons, 
and (3) make outputs. The neurons that take in information from the 
environment are called sensory neurons. These are specialized to 
respond to a particular stimulus, such as light, heat, chemicals, or 
vibration — anything you might encounter from outside, or even 
inside, the body. The processing within the brain can range from a 
knee-jerk reaction — which takes place entirely in the spinal cord — to 
the strategy adopted by a master chess player. In humans, we usually 
call this "thinking." The output is most often a body movement, which 
results from the action of motor neurons. The brain is the link between 
the outside world and behavior, and is thus crucial for survival. These 
three basic functions are shared by organisms from humans down to 
invertebrates like Caenorhabditis elegans, a nematode that doesn't 
even have a true "brain" but a collection of about three hundred 
neurons. (See the Genes and Development unit.) 

But how does the individual neuron work to carry out these tasks? 
Neurons' unique systems capabilities arise from their cellular ability to 
communicate with one another very rapidly, using both electrical and 
chemical communication. Keep in mind, however, that the neuron is 
not the only type of cell in the brain. The neuron may be the star of 
the show but there are other supporting players. Indeed, neurons 
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constitute only a small fraction of cells in the brain. For every neuron 
there are about ten to fifty supporting cells, called glial cells, in the 
brain. The word "glial" means glue, and these cells are the "glue" of 
the nervous system. They perform many vital tasks, including removing 
dead neurons and debris, releasing critical growth factors to neurons, 
and acting as insulating material for the neurons. 

The incredibly complex ways in which brains function exemplify the 
importance of cell-cell interactions. Below we discuss the chemical and 
electrical means by which neurons communicate, and describe how 
various therapeutic and recreational drugs alter these processes at the 
molecular level. We then turn to the molecular nature of memory and 
learning. Finally, we describe recent studies that demonstrate that new 
neurons are being produced continuously in us. 

The Neuron as a Battery 

The neuron is an extraordinarily specialized cell. Most neurons are 
referred to as "bipolar"; they have a cell body and many small 
extensions, called dendrites, at one end which receive information. 
(Fig, 1) At the other end is its most striking feature: a long axon that 
ends in "synaptic terminals," which send signals to the dendrites of an 
adjacent neuron. The longest axon in the human body, the one that 
goes from the base of the spinal cord to the big toe, is about one 
meter long. Early studies on the physiology of neurons examined those 
from the giant axon of the squid, which is so big that it is visible with 
the naked eye. Note that the neuron, in addition to its specialized 
functions, carries out nearly all of the functions of a normal cell, except 
for division. 
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Figure 1. The parts of the neuron: 
information is received by dendrites, 
and action potentials are sent out 
from the cell body down the axon 
to the synaptic terminals. 
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The neuron is an electric battery and works by changes in its voltage. 
Compared with its surroundings, the inside of a "resting neuron" has a 
lower concentration of sodium ions and a higher concentration of 
potassium ions. Because of this imbalance of positively charged ions 
across the membrane, the inside of the resting neuron is negative 
relative to the outside. This difference in voltage is called the 
membrane potential. Atypical membrane potential for a neuron at 
rest, the resting potential, is -0.07 volts, or -70 mV. Although this is a 
rather modest voltage (about five percent of that of an AA battery), 
consider that this voltage occurs across a miniscule length — that of 
the cell membrane. If this were an electric field, the charge separation 
would be about 100,000 volts per centimeter. 

Note that the term "resting neuron" refers only to its electrical state. 
The cell is really not at rest because, in addition to carrying out all of 
the normal functions of the cell, the neuron has to maintain this ionic 
imbalance. This is achieved by the sodium-potassium pump, which 
actively transports potassium in and sodium out. The pump maintains a 
negative voltage because it actually pumps three sodium ions out for 
every two potassium ions it pumps in. The membrane potential of a 
neuron at any given time is the product of many variables, including 
the imbalance of ions across the membrane and the membrane's 
permeability to each ion. In addition to sodium and potassium, 
chloride is an important ion in "setting" a neuron's rest potential 
because negatively charged chloride ions can pass through open "leak 
channels" at rest. Another ion crucial for neural communication is 
calcium, which acts as a powerful intracellular signaling molecule once 
it enters through its ion channels. 

Voltage-Gated Channels 

The neuron, like all cells, possesses a cell membrane that is mostly lipid. 
Ions like sodium and potassium cannot cross the lipid membrane on 
their own. In all cells transport of ions, as well as some small molecules, 
is carried out by channels, which are very tiny openings in the 
membrane formed by protein pores. These channels are often gated — 
that is, opened or closed — depending on the conditions of the cell. 
When open, the ions can enter and pass through channels by diffusion. 
Ions will always travel down their electrochemical gradient. For 
example, sodium is much more plentiful outside the cell than inside. It 
is also positively charged, while the inside of the cell is typically 
negatively charged relative to outside. Thus, both the chemical and 
electrical components of the gradient will drive sodium ions into the 
cell when sodium channels open. Voltage-gated channels are those 
in which the membrane potential of the cell determines whether they 
are opened or closed. Other channels can be opened or closed by 
various chemicals, such as neurotransmitters. 

Channel proteins that span the cell membrane form the ion channels. 
To determine the structure of proteins, scientists have often used 
X-ray crystallography. (See the Proteins and Proteomics unit.) In 
2003 Roderick MacKinnon and his colleagues used this technique to 
examine the structure of a voltage-gated potassium channel from a 
unicellular archaea. Previous studies have shown that ion channels 
have a central ion-conduction pore. Like all proteins, ion channel 
proteins are made up of amino acids, some of which are charged. 
When voltage changes occur, these charged components of the protein 
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make very small movements. This can result in more dramatic 
conformational changes, causing the channels to open and close. 
MacKinnon's group found that "voltage-sensor paddles" surround this 
pore. It appears that with voltage changes in the membrane, these 
paddles will move and thus permit potassium ions across the 
membrane. 2 Further study of the structure of the different classes of 
ion channels from other species will help elucidate the mechanisms by 
which they allow ion transport. 

The Action Potential 

What is a nerve impulse? A nerve impulse, or an action potential, is a 
series of electrical responses that occur in the cell. (Fig, 2) With the 
appropriate stimulation, the voltage in the dendrite of the neuron will 
become somewhat less negative. This change in the membrane 
potential, called depolarization, will cause the voltage-gated sodium 
channels to open. Sodium ions will rush in, resulting in a rapid change 
in the charge. At the peak of the action potential, that area of the 
neuron is about 40 mV positive. As the voltage becomes positive, the 
sodium channels close, or inactivate, and the voltage-gated potassium 
channels open. These potassium channels let potassium ions rush out 
of the cell, causing the voltage to become negative again. The 
potassium channels remain open until the membrane potential 
becomes at least as negative as the resting potential. In many cases, 
the membrane potential becomes even more negative than the resting 
potential for a brief period; this is called hyperpolarization. An 
action potential typically lasts a few milliseconds. 



Figure 2. A cross-section of an axon, 
with an action potential (AP) moving 
from left to right. The AP has not yet 
reached point 4; the membrane there is 
still at rest. At point 3, positive sodium 
ions are moving in from the adjacent 
region, depolarizing the region; the 
sodium channels are about to open. 
Point 2 is at the peak of the AP; the 
sodium channels are open and ions are 
flowing into the axon. The AP has 
passed by point 1; the sodium channels 
are inactivated, and the membrane is 
hyperpolarized. 
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How can this action potential be propagated along the neuron? 
When the sodium channels are opened, sodium ions rush in; once 
inside they cause nearby regions of the neuron to become 
depolarized by moving laterally through the axon. This, in turn, causes 
the opening of more voltage-gated sodium channels in those regions. 
Thus, the sodium channel activation moves in a wave-like fashion: the 
action potential is propagated down the length of the neuron, from 
its input source at the dendrites, to the cell body, and then down the 
axon to the synaptic terminals. How does the action potential 
maintain this directional flow that is key to information processing? 
The sodium channels have a mechanism that avoids "back 
propagation" of the action potential, which would result in a 
confused signal. After opening, the sodium channels become 
inactivated as the potential becomes more positive, and they cannot 
open again until they are "reset" by hyperpolarization at the end of 
an action potential. This brief period of sodium channel inactivation, 
called a refractory period, prevents bidirectional propagation of the 
action potential, constraining it to go in only one direction. 

Myelin Speeds Up Thought 

Most neurons have a fatty outer layer called myelin, which insulates 
and protects the axons of neurons. In this way, myelin is like the plastic 
that surrounds electric wires. Myelin is actually made up of two special 
classes of glial cells, called the oligodendroglia and Schwann cells, 
which wrap themselves around the axon much like a jellyroll. Between 
these cells there are small gaps in the myelin sheath called the Nodes 
of Ranvier. Action potentials are able to jump from one node to the 
next one down the neuron incredibly rapidly. For this reason, impulses 
will travel down a myelinated neuron faster than they will across an 
unmyelinated neuron. In myelinated neurons, action potentials usually 
travel at over 100 meters per second, which is about half the speed of 
sound. In about one-hundredth of a second, an action potential can 
travel from the brain to the base of the spinal cord of an adult. 
Though seemingly instantaneous, this rate is still on the order of a 
million times slower than electricity. 

Several degenerative diseases are due to the loss of myelin in certain 
neurons. The loss of muscle coordination that people with multiple 
sclerosis face is due to the degeneration of the myelin sheath in classes 
of neurons that are involved in the movement of muscles. The disease 
is suspected to be an autoimmune disorder — the immune system 
attacks the myelin sheaths. While MS is usually strikes first in early 
adulthood, many other diseases that are due to myelin degeneration 
occur in infancy or early childhood. 

Across the Synapse 

How is information transferred from one neuron to the next? 
Neurons communicate at their meeting points, called synapses; the 
small gaps separating the neurons are referred to as the synaptic 
space. These synapses are not merely gaps but are functional links 
between the two neurons. Signals are transferred in only one 
direction across the synapse. The neuron that transmits information 
when it fires is called the presynaptic neuron. The synaptic 
terminals of the presynaptic neuron are on one side of the synapse; 
the dendrites of the other neuron, the postsynaptic neuron, are on 
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the other side. Presynaptic and postsynaptic are relative adjectives; a 
postsynaptic neuron at one synaptic connection can be a presynaptic 
neuron at another synapse. 

Synapses can be either chemical or electrical. An electrical synapse is 
what is often called a "gap junction," in which the membranes of two 
neurons are continuous at tiny spots, making the cells electrically 
contiguous. Gap junctions, which are not unique to neurons, allow for 
even more rapid communication. No chemical intermediary is involved 
in an electrical synapse. In the case of chemical synapses, however, 
chemicals called neurotransmitters are released from a presynaptic 
neuron, and dock with receptor proteins on the postsynaptic neuron. 
Such binding causes the shape of the protein to change and ion 
channels to open, much like the voltage-gated channels open in 
response to membrane potential changes (Fig, 3). We will discuss 
neurotransmitters in more detail below. Neurons are typically separated 
by about twenty to thirty nanometers in chemical synapses. Electrical 
synapses are more rapid than chemical ones but chemical synapses are 
easier to modulate. In vertebrates and many invertebrates, chemical 
synapses are more common than are electrical ones. 

The action of the presynaptic neuron is referred to as an "all or none" 
response. A neuron can only fire or not fire; there is no "slightly 
activated" signal from a neuron. Whether or not a neuron will fire an 
action potential — that is, send a signal down its axon to be received 
by other neurons — depends on how many inputs it is receiving. It also 
depends on the nature of each input signal — excitatory or inhibitory 
— at each synapse. The sort of "net total" result of those signals 
determines whether the neuron will become excited, or depolarized, 
enough to fire an action potential and release neurotransmitter from 
its axon terminals. 

Also recall that a signal traveling through the brain often involves 
many neurons, each making so many connections. This 
interconnectedness gives rise to the extraordinary complexity of the 
brain. The activation of a single sensory neuron could quickly lead to 
the activation or inhibition of thousands of neurons. 

Neurotransmitters and Receptors 

Neurotransmitters are usually small molecules, such as amino acids 
(e.g., glutamate and aspartate) and amines (e.g., dopamine, serotonin, 
and histamine). Some neurotransmitters stimulate neurons to fire, 
while others inhibit firing. The effect of the neurotransmitter comes 
about by its binding with receptor proteins on the membrane of the 
postsynaptic neuron. Each neurotransmitter binds specifically in a lock- 
and-key mechanism to its type of receptor. Neurons in different 
pathways will often have different types of receptors in a given family. 
For example, dopamine binds to dopamine receptors, but there are 
about a dozen subtly different dopamine receptors. Neurobiologists 
think that the human nervous system uses at least fifty 
neurotransmitters, but about ten carry out most neurotransmission. 
Many of these neurotransmitters are highly conserved in other 
organisms. Most neurons release only one type of neurotransmitter. 

Neurotransmitters are released in a process called exocytosis. When 
the action potential reaches the end of an axon the depolarization 
causes calcium channels to open. The calcium causes synaptic vesicles 



Figure 3, Synaptic vesicles fuse with 
the presynaptic membrane to release 
neurotransmitter into the synaptic 
space. Here, they bind with 
neurotransmitter receptors in the 
postsynaptic membrane. 




POSTSYNAPTIC MEMBRANE) 



^NEUROTRANSMITTER RECEPTOR) 



Photo-illustration — Bergmann Graphics 



Neurobiology 



Rediscovering Biology 



that carry the neurotransmitter to fuse with the cell membrane. This 
fusion allows the neurotransmitter to be released into the synapse. 
Although exocytosis occurs in many cell types, neurons use a 
specialized form in which calcium causes a chain of events that 
culminates in fusion of the vesicles. 

There are two general categories of receptor proteins: ionotropic and 
metabotropic. Activation of ionotropic receptors causes membrane ion 
channels to open or close. In contrast, activation of metabotropic 
receptors involves an intracellular biochemical cascade. Such a cascade 
may end with the opening or closing of ion channels or other 
intracellular effects. 

As long as the neurotransmitter remains in the synapse, it will continue 
to bind its receptors and stimulate the postsynaptic neuron. At some 
point the signal is no longer needed. Moreover, continual stimulation can 
injure some neurons. So, halting the stimulus is just as important as the 
appropriate starting of the stimulus. How does the neurotransmitter 
leave the synapse? There are several ways, such as diffusion away from 
the synapse or breakdown of the neurotransmitter by specific enzymes. 
Another common mode, called reuptake, involves specialized 
molecules present on the membrane of the presynaptic neuron. These 
molecules, called neurotransmitter transporters, have receptor sites 
that will bind to the neurotransmitter and actively transport it out of 
the synapse, back to the presynaptic neuron. That neuron can then 
reuse the neurotransmitter. The action of several drugs takes place at 
the reuptake stage. 

Neurotransmitters, Psychoactive Drugs, 
and the Reward Pathway 

Drugs that have effects on the central nervous system are known as 
psychoactive drugs. The mode of actions of both therapeutic drugs 
(e.g., Ritalin, Prozac, and Paxil) and recreational drugs (e.g., alcohol, 
cannabis, cocaine, and nicotine) affect the firing of certain neurons by 
changes in various neurotransmitters or receptors. Not all drugs have 
specific modes of action; alcohol, for example, has many and varied 
effects. We will focus, however, on a few examples of those drugs that 
have specific effects. 

Humans and many other animals engage in many activities from which 
they derive pleasure. Researchers working with various animals have 
shown that there are regions of the brain, such as the ventral 
tegmental area, that are more active when animals engage in 
pleasurable acts. When researchers stimulate these areas 
experimentally, the animals will perform various tasks in order to 
receive further stimulation. Hence, the neural pathway comprises those 
regions has been called the reward pathway. 

Like many drugs, nicotine from tobacco products acts on the reward 
pathway. This drug, however, is unusual in that it directly affects the 
dopamine receptor in the reward pathway's neurons. Unlike the action 
of most drugs, no intermediary steps are involved: nicotine binds to 
the receptor and stimulates the postsynaptic neuron. The 
overstimulation of the postsynaptic cell, however, also has effects at 
the cellular level. Over time, it leads to a decrease in the number of 
dopamine receptors being expressed and inserted to the membrane, as 
well as a change in the shape of the cell. The reduction of receptors is 
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referred to as desensitization. When the nicotine is removed, because 
there are fewer receptors on the postsynaptic cell, more dopamine 
than normal is required for proper stimulation of postsynaptic neuron. 
Addiction can result because nicotine becomes needed just to maintain 
the normal stimulation of the postsynaptic cells. 

Allelic variation at the dopamine receptor gene appears to affect one's 
likelihood of becoming addicted to nicotine. Individuals who have the 
A1 allele have fewer dopamine receptors than those that do not have 
the allele. These individuals also have more difficulty in quitting 
smoking and are more likely to exhibit other addictive and compulsive 
behaviors. The genetic components of many types of addiction are the 
topic of intensive research — and often heated debate. 

Cocaine also works on dopamine and the reward pathway but does so 
in a different way. Recall that some neurotransmitters are normally 
taken up by the presynaptic neuron by reuptake receptors, or 
transporters, in the presynaptic membrane. (Fig, 4) The molecular 
structure of cocaine is such that it can block the binding site for 
dopamine on its reuptake receptor. Because this cell is now impaired in 
the reuptake of dopamine, an excess of dopamine builds up in the 
synapse. This excess leads to overstimulation of the postsynaptic 
neuron. Because the action is occurring in the reward pathway, 
overstimulation leads to euphoria. The effects of overstimulation of 
the postsynaptic cell by cocaine are much the same as those of 
nicotine: the reduction of the number of receptors leads to 
desensitization and the possibility of addiction. 



Figure 4, Left: Dopamine in the 
synaptic space binds to dopamine 
receptors on the postsynaptic cell. 
Dopamine transporters in the 
presynaptic membrane take up the 
dopamine molecules from the synaptic 
cleft and return them to the presynaptic 
cell. Right: Cocaine blocks the reuptake 
of dopamine, leading to molecular 
changes that contribute to addiction. 
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There have been concerns that Ritalin (methylphenidate), used for 
treatment of attention deficit and hyperactivity disorder (ADHD), is 
chemically similar to cocaine. Indeed, Ritalin increases dopamine levels 
by interfering with reuptake. Moreover, Ritalin and cocaine compete 
for the same receptor site. One crucial difference between these two 
drugs is that Ritalin acts much more slowly than cocaine. While 
cocaine's effects on dopamine levels occur within seconds, the response 
from Ritalin (when administered in pill form) takes about an hour. 
Some studies suggest that, far from leading to addiction, Ritalin 
treatment in childhood may be associated with decreased risk of drug 
and alcohol use later on. Other studies, however, suggest that Ritalin 
may be a gateway drug: by using it, teens may be more willing to 
experiment with other drugs. As of 2003 the consequences of Ritalin 
treatment remain unresolved. (Fig, 5) 



Figure 5, The chemical structures of 
dopamine, Ritalin, and cocaine are 
structurally similar: they all bind at the 
dopamine transporter, affecting 
reuptake of dopamine. 
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Prozac and Serotonin Reuptake 

Soon after it was released to the market in 1988, Prozac (fluoxetine 
hydrochloride) became the most prescribed drug to treat depression. It 
and several other antidepressants inhibit the reuptake of serotonin, a 
neurotransmitter that affects mood, sleep, and appetite. These drugs 
are called selective serotonin reuptake inhibitors (SSRIs) because, 
unlike older antidepressants, they have little effect outside of 
serotonin reuptake. By inhibiting the reuptake of serotonin, Prozac 
and SSRIs increase the level of serotonin in the synapses. The increased 
levels of this neurotransmitter generally result in an improved mood. 
Depressed patients often had lower than normal levels of serotonin. 

Cannabis, the Cannabinoid Receptors, 
and Endocannabinoids 

The active ingredient of marijuana, from the cannabis plant, is THC 
(delta-9-tetrahydrocannabinol). This chemical exerts its effects on the 
brain by binding to receptors called the cannabinoid receptors. 
Scientists have identified two cannabinoid receptors (CB1 and CB2), 
and evidence suggests that there may be others. Although CB1 is 
found in many regions of the brain, CB2 is present only in certain cells 
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of the immune system. Because the receptor is present in several brain 
regions, THC can have manifold effects. For instance, THC may affect 
memory formation. CB1 is prevalent in the hippocampus, a region of 
the brain strongly associated with memory. By binding to and 
activating CB1, THC decreases activity of neurons in the hippocampus 
and interferes with the proper function of that region, which may 
translate to an interference with memory formation. 

The human body does not produce THC, so why would there be 
receptors that can bind it? During the 1990s researchers discovered 
that the body makes chemicals, such as anandamide, that can bind to 
the cannabinoid receptors. The function of these chemicals, called 
endocannabinoids, and their receptors is still unknown. To investigate 
the role of the CB1 receptor, scientists have studied mutant mice that 
lack the receptor. Compared with normal mice, these mice have a 
decreased appetite, are less active, and have a reduced lifespan; 
however, the mice have an enhanced memory. 

The CB receptors have recently been associated with some beneficial 
actions, such as pain relief and extinguishing some fear behaviors. THC 
has even been prescribed as medication in some states for pain relief 
for various diseases, including glaucoma, AIDS, and cancer. 3 

The Molecular Basis of Learning 
and Memory 

It is clear that an understanding of mechanisms at the level of the 
synapse explains changes in our behaviors, like movements. But 
what about longer-term changes associated with learning and 
memory? Can they be understood in molecular terms, too? Memory, 
and thus learning, involves molecular changes in the brain. During 
the last few decades, researchers have started to map the molecular 
processes involved in memory formation. They have been 
increasingly able to link the ability to remember with physical 
changes in the structure of neurons. 

One important change that occurs in memory formation is long-term 
potentiation (LTP). This phenomenon involves the long-term 
modification of the synaptic communication. Under normal 
circumstances the rate at which a postsynaptic neuron fires depends on 
how much stimulation it receives from presynaptic neurons. Once the 
increased stimulation has stopped, the postsynaptic neuron will return 
to its normal rate of firing. In LTP, however, the postsynaptic neuron 
will continue to fire at an elevated rate, even after the increased 
stimulation has subsided. It seems to become more sensitive — or gives 
a bigger reaction by firing more action potentials — to a given 
stimulus. How does this happen? 

Glutamate is the neurotransmitter involved in LTP. Glutamate can bind 
to several different types of ionotropic receptors, including the NMDA- 
(N-methyl-D-aspartate) and AMPA- (amino-3-hydroxy-5-methyl-4- 
isoxazolepropionate) type glutamate receptors, each of which opens a 
specific type of channel within the receptor proteins. Both channels are 
involved in memory formation. The NMDA channel requires both 
glutamate and depolarization from another source to open. Why? The 
molecular mechanism is as follows. Normally, at negative potentials, 
positively-charged magnesium ions plug the pore of the NMDA 
channel. While glutamate may "open" the pore, the ions cannot travel 
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through the channel due to the magnesium block. When the 
membrane is depolarized, however, the inside of the cell becomes 
more positive, and the magnesium ions are no longer driven into the 
channel. Thus, the block is relieved, allowing sodium and calcium ions 
to flow in. 

So, this mechanism allows the NMDA-type glutamate receptor to act as 
a "coincidence detector." When the neuron receives input from only 
one source — another neuron — glutamate binds to and opens both 
NMDA- and AMPA-type receptors. (Fig, 6) Because the 
neurotransmitter arrives at a resting, negatively charged, postsynaptic 
membrane, magnesium ions prevent flow through NMDA channels. 
When, however, stimulation of a neuron occurs simultaneously from 
more than one source — say several other neurons — some glutamate 
will bind NMDA receptors in parts of the neuron that are already 
depolarized, or less negatively charged. 

Where does this voltage change come from? Recall that once an action 
potential has started, it spreads from its source throughout the entire 
membrane of the neuron in a wave-like fashion; thus, other dendrites 
may be "pre-depolarized" before glutamate binds. In this cas, the 
block by magnesium is relieved and the NMDA channel also passes 
ions. While AMPA channels can pass only sodium ions in, NMDA 
channels also pass calcium. This calcium permeability gives the NMDA 
channel its ability to trigger LTP. 

Now that we have examined the requirements for LTP, what is the 
effect? When calcium ions rush in, they set off an intracellular 
signaling cascade that can involve dozens of molecules. Speculation 
about the identity and functions of these molecules has been the 
subject of intense scientific inquiry since the early 1990s — it was 
perhaps the most studied aspect of neuroscience during that "decade 
of the brain." 

So how could this intricate electrical mechanism act to form new 
memories? LTP, like learning, is not just dependent on increased 
stimulation from one particular neuron but on a repeated stimulus from 
several sources. It is thought that when a particular stimulus is 
repeatedly presented, so is a particular circuit of neurons. With 
repetition the activation of that circuit results in learning. Recall that the 
brain is intricately complicated. Rather than a one-to-one line of 
stimulating neurons, it involves a very complex web of interacting 
neurons. But it is the molecular changes occurring between these 
neurons that appear to have global effects. LTP can lead to strengthened 
synapses in a variety of ways. One such way, as discussed in the video, is 
by the phosphorylation of glutamate receptor channels, which is 
accomplished by a calcium-triggered signaling cascade. This results in 
those channels passing more ions with subsequent stimulation, 
strengthening the signal to and from the neuron. 

But more permanent changes — long-term memory — require the 
synthesis of new proteins. In a variety of organisms, including flies 
(Drosophila) and humans, one enzyme, CREB (cyclic-AMP response 
element binding protein), seems to be involved in the steps that 
facilitate this new protein expression. When calcium flows in through 
NMDA channels, one of the molecules it activates is CREB. In turn, 
activated CREB acts as a transcription factor (see the Genes and 
Development unit) that activates the expression of other genes. This 



Figure 6, Two hippocampal neurons, 
labeled with green fluorescent protein, 
viewed with confocal microscopy. Such 
neurons release and sense glutamate, 
and engage in long-term potentiation 
(LTP). Note the synaptic connections 
between the lateral processes of the 
two neurons. 




Courtesy of Rick Huganir, PhD. 
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gene expression can lead to the production of more ion channel 
receptors, as well as structural proteins like actin, which cement the 
synaptic connection between two repeatedly communicating neurons. 

Mutant mice lacking the NMDA receptors show severe deficiencies in 
memory tasks. On the other hand, researchers have genetically 
engineered (see the GMOS unit) mice that have more of the NMDA 
receptors. These mice, dubbed "smart mice" by the popular press, are 
substantially better at several memory tasks than are normal mice. 

Memory and the Hippocampus 

Psychologists have long argued that there are many different types of 
memory. These can be classified by many criteria, based on decades of 
experimental research and the different memory defects seen in people 
who have suffered brain damage. Scientists have agreed that memory 
can be viewed in temporal terms; that is, there is a short-term memory, 
with a limited capacity for about a dozen items, and a long-term 
memory, to which these items are presumably transferred for 
"storage." Short-term memory seems to be much more vulnerable to 
loss due to trauma than does long-term memory: people may even lose 
the ability to form new memories, while their ability to remember their 
entire lives before an accident remains intact. This memory defect is 
exemplified in the movie Memento (2000), in which a widower avenges 
his wife's murder — during which he suffered brain damage — over 
and over again. Such individuals with this condition of "anterograde 
amnesia" usually have severe damage to their hippocampus. As 
Kempermann points out, the hippocampus is not the equivalent of the 
brain's hard drive but rather a gateway, "a structure, through which all 
information must pass, before it can be memorized." 4 

It is widely agreed that while the hippocampus is undeniably 
important for memory, the "recording" of information into long-term 
memory involves plasticity, or physical changes, in multiple regions 
throughout the entire nervous system. Another interesting distinction 
that scientists have made in types of memory is between declarative 
memory, which allows you to remember facts and is extremely 
complex, and reflexive memory, which usually consists of learning by 
repetition and often involves motor learning. While declarative 
memory can be reported, reflexive memory is exhibited by 
performance of a task and cannot be expressed verbally. It is now 
thought that the two types of memory may involve two entirely 
different neuronal circuits. 

The hippocampus plays a major role in spatial learning and memory in 
a number of animals. Research with black-capped chickadees and other 
species of birds has shown that when the hippocampus is removed, the 
birds still store food but cannot recall where they stored it. Moreover, 
bird species that rely heavily on stored food as a winter resource in 
general have larger hippocampi than those species that don't. 

Studies of cab drivers in London have provided fascinating information 
about the role that the hippocampus plays in spatial memory. London 
cab drivers are known for their navigational skills and knowledge of 
the streets of London. To learn how to navigate the streets of the city, 
would-be cab drivers undergo "the Knowledge," a rigorous training 
that can take two years to complete. Recent studies using magnetic 
resonance imaging (MRI) demonstrate that the hippocampi of the 
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London cab drivers are somewhat different. Specifically, the posterior 
region is significantly larger and the anterior region is significantly 
smaller in the cabbies when compared with control subjects. Other 
studies have found that the posterior region is active during tasks 
involving spatial memory. It is possible that the cabbies come 
disproportionately from those individuals with excellent spatial 
memories and corresponding larger posterior regions of the 
hippocampus. There is further evidence, however, that suggests that 
the memory work of the cabbies has altered their hippocampi. Those 
cab drivers that have been working the longest tend to have larger 
posterior hippocampi than more recently hired cabbies. Furthermore, 
other imaging studies show that the right hippocampus is activated in 
the cab drivers when they are asked to remember complex travel 
routes but not when they are asked to provide information about 
famous landmarks. 5 

Neuronal Stem Cells 

What neuronal processes have led to the changes in the hippocampi of 
London taxi drivers? Perhaps this is achieved by neurons migrating from 
one region to the posterior hippocampus? Another intriguing possibility 
is that the changes are the result of new neurons going to the region. 

New neurons? Don't we have our complete store of neurons by early 
childhood? That previous dominant paradigm had been found 
incorrect. In the past two decades, researchers have shown that 
neurons are continually produced in a variety of animals, including 
humans. It isn't that neurons divide. They don't. Instead, the brain 
maintains a reservoir of stem cells that are capable of generating new 
neurons (neurogenesis). One area of the brain where stem cells have 
been found is the hippocampus. 

The discovery of stem cells and neurogenesis began with basic research 
with songbirds. During each breeding season male songbirds need to 
recall their mating song. Starting in the 1980s researchers noted that 
the number of neurons in certain areas of the brain (especially the 
hippocampus) would increase in male birds around the start of the 
breeding season. The number of neurons in these areas would 
decrease after the mating season. This striking evidence led other 
researchers to look for neurogenesis in the brains of mammals. Studies 
on rats found substantial neurogenesis. In one part of the 
hippocampus alone nearly 10,000 new neurons are generated each day 
in adult rats. Starting in the 1990s Elizabeth Gould of Princeton 
University found that the adult brains of several species of monkeys 
also undergo considerable neurogenesis. 

Following these animal studies researchers examined whether humans 
have the capacity for neurogenesis. They studied postmortem brain 
tissue from humans, using various stains to determine whether new 
neurons were being generated from dividing progenitor cells. They 
were able to find such new neurons in the hippocampus, showing that 
neurogenesis proceeds throughout life in at least some regions of the 
human brain. 

Engaging in mental and physical activity is one important way elderly 
people can maintain their mental acuity. This aspect of conventional 
wisdom has been vindicated by medical research. Mental and physical 
activity reduces the risk of neurodegenerative disorders and improves 
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the prognosis of stroke patients. Yet, we know little about the 
molecular mechanisms behind this effect. Studies in mice of 
neurogenesis in the hippocampus, however, point to one possible 
reason for why activity keeps the mind sharp. Mice who were exposed 
to an enriched environment for the second half of their lives showed a 
dramatic increase in neurogenesis in the hippocampus as compared 
with control subjects. The hippocampi from the mice that received the 
enriched treatment also appeared like those of younger animals. These 
results strongly suggest that activity maintains the proper function of 
the brain by increasing neurogenesis in the hippocampus. 

Elizabeth Gould and other researchers studying neurogenesis think 
that the new neurons generated in the hippocampus are involved in 
modulation of the stress response as well as learning. There are some 
complications, however. Learning enhances neurogenesis but only 
under certain conditions. Moreover, experimental blockage of 
neurogenesis interferes with some types of learning but not others. 

Our understanding of neurogenesis remains far from complete. Yet, 
tremendous progress has been made during the last two decades and 
further progress is expected. In addition to what these studies tell us 
about how the brain works, they may also pave the way toward 
treatment of degenerative diseases like Alzheimer's and Parkinson's as 
well as brain trauma. 
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Building from case examples, a neurobiologist and a 

neurosurgeon describe the workings of the brain. 

Drickamer, L. C, S. H. Vessy, and E. M. Jakob. 2002. Animal behavior: 
Mechanisms, ecology, and evolution. 5th ed. McGraw-Hill. 

A university- 1 eve I textbook on animal behavior that has an 

excellent section on the neurobiology of behavior. 

Timmons, C. R., and L. W. Hamilton. Drugs, brains & behavior. 
www.rci.rutgers.edu/~lwh/drugs/. 

A short e-book detailing the neuropharmalogical effects 

of drugs. 
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Sullivan, J. M. 2002. Cannabinoid receptors. Curr. Biol. 12:R681. 
A short guide to recent research on cannabinoids and 
their receptors. 
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Glossary. 



Action potential. The nerve 
impulse, or "firing," of a neuron. 
A traveling wave of depolarized 
voltage that is propagated along a 
neuron. Results in the release of 
neurotransmitter and the 
movement of information to 
another neuron. 

Depolarization. The state in 
which the inside of a neuron 
becomes more positive in voltage 
than it is at rest. 

Hippocampus. A region of the 
brain associated with memory 
formation. 

Hyperpolarization. A state in 
which the membrane potential is 
more negative than is the resting 
potential; occurs transiently at the 
end of an action potential. 

lonotropic receptors. Receptors 
for which neurotransmitter 
binding results directly in an ion 
channel opening or closing. 

Long-term potentiation. An 

enduring increase in the strength 
of the connection between two 
neurons, which results from 
repeated stimulation of a given 
input pathway. 

Membrane potential. The 

difference in voltage between the 
inside and the outside of a 
neuron; the outside is always zero. 

Neurogenesis. The formation 
of new neurons from precursor 
stem cells. 

Neurotransmitter. A molecule 
that travels across the synapse 
and binds to its receptor on the 
postsynaptic neuron, influencing 
its probability of firing. 



Phosphorylation. The addition 
of a phosphate group to a 
molecule, such as a protein. 

Postsynaptic neuron. At a given 
synapse, the postsynaptic neuron 
is the receiving neuron at its 
dendritic end. 

Presynaptic neuron. At a given 
synapse, the presynaptic neuron 
is the transmitting neuron, its 
axonal synaptic terminal forms 
the synapse. 

Resting potential. The resting 
membrane potential of a neuron; 
it is about -70 mV. 

Reuptake. The recapture of 
neurotransmitters from the 
synapse back into the presynaptic 
neuron; accomplished by 
transporters. 

Reward pathway. A pathway 
in the brain that is stimulated 
when an animal is engaged in 
pleasurable activities. 

Synapse. A functional 
connection between two 
neurons where information can 
be exchanged in the form of 
electrical or chemical energy. 

Transcription factor. A protein 
that influences transcription of 
another gene by binding to DNA. 

Voltage-gated channels. 

Ion channels in the cell membrane 
that open or close in response to 
changes in the membrane voltage. 

X-ray crystallography. 

A method for determining the 
structure of a molecule, such as 
a protein, based on the diffraction 
pattern resulting from focused 
X-ray radiation onto pure crystals 
of the molecule. 
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Sex and Gender 



"/ think humans like things to be ordered, and they get 
bothered about gray areas and when things become 
less clear-cut. But these days I don't think so much in 
black and white about male and female. Now I think of 
it all as being on a spectrum." Dr. Andrew Sinclair 1 
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Introduction 

Max Beck was born an intersexual, someone with ambiguous 
genitals. Like most babies without a normal penis, he was "assigned" 
the sex of female and underwent plastic surgery to "fix" his genitals. 
He was named Judy and grew up as a girl, a self-described tomboy. In 
his teens, more surgery and female hormone injections turned him into 
a woman — a woman with no sense of gender identity. As a young 
adult he had sexual relationships with males and females, first 
accepting himself as a lesbian, then marrying a man. After divorcing 
his husband, he once again became a lesbian with a partner. Finally he 
received his medical records, which revealed that he was an intersexual 
and had both an X and a Y chromosome. Over a period of two years he 
decided that he could no longer live as a female. He reassigned himself 
as a male, married his female partner, and became the father of a 
child, conceived by his wife using donor sperm. Despite his sex 
assignment as a female at birth, Max was never able to accept his 
gender as female. 

In contrast, Jan Morris was born James Morris, an apparently normal 
male. A successful journalist, author, and mountain climber, she 
married and had five children before she decided in her 30s to change 
her sex to female and her name to Jan. Jan Morris wrote of her sex 
change in the book, Conundrum, explaining that she had always 
known that she was a woman, wrongly born into the body of a man. 
She has continued to be a successful writer and lives harmoniously with 
her former wife. 

The use of pronouns above may strike some as strange. Which pronoun 
should be used in the case of transgendered individuals? This question 
highlights the difficulties our language and culture have in 
confronting issues of sex and gender. We have used the pronoun of 
the individual's final choice of gender. 

What is the difference between sex and gender? Max Beck was born a 
male (albeit with ambiguous genitals), and efforts to change him into 
a girl by surgery and hormones failed to change his gender, his sense 
of identity. In contrast, Jan Morris believes that she was born a woman 
in a man's body. Transsexuals like Jan Morris explain that they must 
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change their physical sex because there is no way to change their 
gender, which comes from the brain. While there is no general 
agreement on terminology, the Merriam-Webster dictionary defines 
gender as "the behavioral, cultural, or psychological traits typically 
associated with one sex," and defines sex as "either of the two major 
forms of individuals that occur in many species and that are 
distinguished respectively as female or male." This suggests that sex is 
biological while gender is subjective. 



Sex and the Y Chromsome 

Except for the sex chromosomes (X and Y), all humans have the same 
set of chromosomes. The karyotype of a human male is 46XY (46 
chromosomes, including one X and one Y), and that of a female is 
46XX (46 chromosomes, including two X chromosomes.) In mammals as 
a whole, the presence or absence of the Y chromosome usually 
determines sex. Individuals with an X chromosome but no Y (45X0) are 
female (Turner's syndrome); individuals with two X chromosomes and a 

Y chromosome (47XXY) are male (Kleinfelter's syndrome). 

The Y chromosome is considerably smaller than the X chromosome and 
has a much lower density of genes. In fact, the Y has often been called 
a "genetic junkyard." But there are a few rubies among the rubbish of 
that genetic junkyard: the Y chromosome contains the genes are 
essential for male fertility and other male characteristics. 

Why does the Y chromosome have so few functional genes? 
Evolutionary biologists are still debating the details but they agree 
that the lack of recombination explains the paucity of functional genes 
on the Y Unlike the twenty-two pairs of autosomes, there is no 
recombination between the X and most of the Y chromosome. Genes 
on the part of the Y chromosome that does not recombine will be 
passed from father to son, down a paternal lineage, and will never be 
present in females. The lack of recombination weakens the 
effectiveness of natural selection to weed out bad variants and select 
for good ones. Over many millions of years mutations and random 
genetic drift erode the Y chromosome, turning it into a genetic 
junkyard. In contrast, genes on the X are present in both males and 
females; X chromosomes, like autosomes, recombine in production of 
female gametes. 

About five percent of the Y chromosome does recombine with the X. 
This region, at the tips of the chromosomes, is called the 
pseudoautosomal region because in it the X and Y chromosomes 
behave as autosomes (Fig, 1). The pseudoautosomal region is more 
gene-rich than the rest of the Y chromosome. Several of the genes on 
the pseudoautosomal region of the Y have counterparts on X, 
reflecting a common evolutionary ancestor. The genes required for 
male fertility are found in the non-recombining regions of the Y, and 
are not present on X. 

Researchers in David Page's lab have shown that one-quarter of the 

Y chromosome consists of eight families of nearly identical nucleotide 
sequences, and includes duplicate copies of important genes. Because 
these regions are arranged in palindrome fashion, they provide a 
mechanism for a kind of internal recombination between the similar 
genes on the same chromosome. This process, called gene conversion, 
aids in the detection and repair of gene mutations in this part of the 

Y chromosome. 2 
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Figure 1. The Y chromosome is 
very small compared to the X 
chromosome. The pseudoautosomal 
regions at the tips contain the 
genetic material on the Y that 
shows similarity to the X 
chromosome. The SRY gene is 
located on the p arm of the Y. 
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Paternal Inheritance 

The lack of recombination means that the entire non-recombining 
portion of the Y is passed intact from father to son. A male shares the 
same Y chromosome with his father, paternal grandfather, paternal 
great-grandfather, and so on. (Fig, 2). Researchers can establish 
paternal genetic relationships by comparing small differences 
(polymorphisms) between modern Y chromosomes. The identification 
of genetic markers such as single nucleotide polymorphisms 
(SNPs) and indels (insertions and deletions) in the non-recombining 
regions of the Y provides a tool to study population structure and 
history, genealogy, and human evolution. Because these regions do not 
recombine they change very slowly, so they may be useful in 
identifying stable paternal lineages over thousands of years. Mutations 
occasionally occur in this DNA, however, which are then inherited 
down the paternal line. 
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Figure 2. Maternal lineages can be 
traced through mitochondrial genes, 
which are inherited by males and 
females only from the mother. Paternal 
lineages can be traced through the Y 
chromosome, which is inherited only by 
males and only from the father. 
(M=male and F=female) 
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Evolution of the Y Chromosome 

The evolutionary ancestor of the sex chromosomes was a pair of 
matched, autosomal chromosomes that acquired sex-determining 
genes on one member of the pair. This occurred about 350 million 
years ago in a reptile-like ancestor. Over time additional genes with 
male-specific functions accumulated in this same chromosome, called 
proto-Y, which then lost its ability to recombine with its counterpart 
chromosome, called proto-X. There are four regions of the proto-X 
chromosome, which appear to have been involved in four different 
steps, resulting in the loss of recombination with proto-Y. Each of the 
four regions accumulated mutations in those non-recombining regions 
of proto-Y at four different times in evolution. Each time 
recombination was lost there was degradation and loss of the non- 
recombining region. Over time this chromosome evolved into Y, losing 
most of its genetic information as a result of the degradation of the 
non-recombining regions of the chromosome. Its partner chromosome 
evolved into the X chromosome. The degeneration of the Y was offset 
at various times by additions of autosomal genes to this chromosome 
(as well as to X), leading to a pattern of loss and gain of genetic 
material over a period of about 170 million years (Fig, 3). 
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X Inactivation 

Having a single copy of any chromosome other than the X or the Y is 
lethal in humans; however, only one X chromosome is needed for 
normal development to occur. Therefore, the evolutionary process that 
resulted in a loss of genes from the Y chromosome would seem to have 
presented a problem. At least two possible mechanisms could balance 
gene expression between the two X chromosomes in females, versus 
only one X in males. Gene activity on the one X present in males 
(relative to the ancestor before the evolution of the XY system) could 
be increased so that these genes produce twice as much in males as in 
females. Alternately, X-linked genes could have their activity decreased 
in females. The first mechanism is seen in some insects, including 
Drosophila, while mammals use a special variant of the second, called 
X-inactivation. In X-inactivation, female embryos randomly inactivate 
one X chromosome in each cell, resulting in only one functional copy 
of X-linked genes in both males and females. 

X-inactivation requires a locus on the X, called the X-inactivation 
center. At this locus, inactivation occurs in response to a developmental 
cue, which is present only at specific stages of embryo development. 
Inactivation occurs because of a specific type of RNA, which binds to 
one X chromosome, preventing transcription of the genes on this 
particular copy. In addition, enzymes add methyl groups to the DNA of 
the inactive X, resulting in repression of transcription. The inactivated 
X is visible during interphase in mitosis as a condensed chromosome, 
called a Barr body. It replicates in the S (synthesis) phase of the cell 
cycle later than does the active copy. Inactivation of one of the two X 
copies in a female leaves only one active X chromosome in any cell. An 
individual who has three X chromosomes has two inactivated copies of 
the X, producing two Barr bodies. 

Because the X is inactivated randomly in cells, one cell could have the 
maternal X inactivated, while the adjacent cell could have the paternal X 
inactivated. This causes a pattern of gene expression called mosaicism, 
which occurs when different alleles of X-linked genes are expressed in 
different cells. A classic example of mosaicism is the female calico cat, 
which inherits an X-linked allele for yellow coat color from one parent 
and an X-linked black allele from the other. One or the other color is 



Adapted from: Scientific American, February 2001, "Why the Y is So Weird" 

Figure 3. The degeneration of the 
Y occurred in four discrete episodes, 
beginning about 300 million years ago 
when a reptile-like ancestor acquired 
the SRY gene on one of its autosomal 
chromosomes. Each of the four 
episodes involved a failure of 
recombination to occur between the 
X and the Y chromosomes, resulting in 
subsequent decay of some genes in 
the non-recombining region. 
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expressed in patches of the coat that represent cells descending from 
parental cells with either an active maternal X or paternal X. 

Genetic Imprinting 

Some genes are expressed only from the maternal chromosome, while 
others genes are expressed only from the paternal chromosome. The 
second gene copy is silenced during gamete formation in the egg 
(when maternal gene copies are silenced) or the sperm (when paternal 
gene copies are silenced). This is known as genetic imprinting. 
Imprinting occurs in each generation when new egg and sperm cells 
are produced. 

Relatively few genes in humans are known to be imprinted and such 
genes tend to be clustered in the genome. The gene imprinting occurs 
by the addition of methyl groups to the DNA of the silenced gene, 
preventing transcription of the gene. This gene silencing acts in much 
the same manner as mutation or deletion of one copy of a gene, 
except that it is not a permanent, heritable change. If the one 
remaining active gene is deleted or mutated, there is no extra, 
functional copy on the second chromosome; therefore, mutation of 
the single, active copy of an imprinted gene may result in disease. 
Similarly, as a result of an error, cells may receive all or part of a pair of 
chromosomes from a single parent. For imprinted genes, that means 
that the cell receives either two imprinted copies or two active copies. 
If both copies are imprinted, there is no functional gene. Two active 
copies of a gene may also result from a mutation that leads to loss of 
imprinting; neither copy will be silenced. Too many active copies of a 
gene may result in overexpression of a gene, which can cause disease. 
A number of cancers have been associated with failure to imprint 
genes, especially genes that produce growth factors. Overexpression of 
growth factors can disrupt the cell cycle, contributing to uncontrolled 
cell growth and cancer. (See the Cell Biology and Cancer unit.) 

Testis-Determining Factor 

The presence of a Y chromosome is usually necessary and sufficient for 
male development: a 45X0 human is female, while a 47XXY is male. 
It also typically leads to formation of a testis in the mammalian 
embryo — the primary sex-determining event. The testis then 
produces and secretes the male hormones, androgens, resulting in the 
formation of male genitalia. In the absence of Y, the pathway leads to 
development of a female (Fig, 4). Therefore, the Y must contain a 
testis-determining factor. 

The region of the Y chromosome that carries the testis-determining 
factor contains a gene called SRY (sex region Y). Its product binds to 
DNA, acting as a transcription factor that is critical for testis 
production. Scientists studying sex reversal, a difference between the 
chromosomal sex and the phenotypic sex, confirmed the importance of 
SRY. They determined that infertile males who were XX had all 
acquired a particular snippet of the Y chromosome, which was 
translocated to X. That small fragment of the Y carries SRY. Conversely, 
many XY females have a deletion of the part of the Y that includes 
SRY. Introduction of the mouse SRY gene into an XX mouse causes the 
formation of testis and the animal develops as a male anatomically; 
however, it does not produce sperm. Thus, SRY is the testis- 
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determining factor, and is the only gene on the Y chromosome that is 
essential for development of male genitalia. Some genes required for 
male fertility are on the Y chromosome, while others are on the X or 
on autosomal chromosomes. The DAZ genes on the Y are essential for 
sperm formation; deletion of DAZ results in male infertility. 

In the first few weeks of development a human embryo develops a 
sexually indifferent gonad, which can become either a testis or an 
ovary. Without SRY to stimulate testis development, the gonad 
becomes an ovary and the embryo develops into a female; the 
development pathways of both male and female are complex, 
however, and are regulated by several gene products (Fig, 4). 
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Figure 4, The genital ridge in an embryo duct 

is converted to a bipotential gonad by the 
products of the LHX9, SF1 , and WT1 genes. 
This gonad develops into an ovary under the 
influence of the WNT4 and DAX1 gene products; 
it develops into a testis under the influence of the 
products of the SRY and SOX9 genes. The ovary produces 

cells that make estrogen, which causes the Mullerian duct to differentiate into the female genitalia. The 
testis makes two hormones, anti-Mullerian duct factor (AMH), which causes the Mullerian duct to regress; 
and testosterone, which causes the Wolffian duct to differentiate into male internal organs. Testosterone is 
also converted into dihydrotestosterone (DHT), which is required for development of male external genitalia. 



For example, the product of the DAX1 gene (present on the X 
chromosome) appears to interact with SRY: an excess of SRY leads to 
testis formation, while an excess of DAX1 leads to ovary formation. A 
mutation in DAX1 leads to sterile males but has no effect on females. 
An extra copy of DAX1 in a male leads to a sex-reversed XY female. 
An SOX9 mutation (on chromosome 17) in a male leads to sex- 
reversed XY females, while an extra copy of SOX9 in a female can 
result in a sex-reversed XX male. Conversely, an extra copy of WNT4, 
which is implicated in ovary formation, in a male results in a sex- 
reversed XY female. 

Hormones 

Hormones are small molecules that bind to specific target cells to 
modify the response of the cell, usually affecting gene expression. For 
example, estrogen is a small, hydrophobic molecule that binds to 
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Gene 


Function of protein 


Action 


SRY 


Transcription factor 


Leads to development of male gonads; 
XY males lacking SRY show sex reversal 
to female. 


WT1 


Transcription factor 


Leads to development of male gonads; 
XY males lacking WT1 show sex reversal 
to female. 


SF1 


Transcription factor 


Leads to development of male gonads; 
XY males lacking SF-1 show sex reversal 
to female. 


SOX9 


Transcription factor 


Leads to development of male gonads; 
XY males lacking SOX-9 show sex 
reversal to female. 


DAX1 


Transcription factor 


Leads to development of female gonads; 
XY males with a duplication of gene for 
DAX-1 show sex reversal to female. 


WNT4 


Signaling factor 


Leads to development of female gonads; 
XY males with a duplication of gene for 
WNT-4 show sex reversal to female. 



estrogen receptors. The estrogen-receptor complex then enters the 
nucleus and binds to specific DNA sequences in certain genes, and 
turns on or off transcription of those genes. 

In females the ovaries secrete estrogens and progesterone, which 
are essential for the development of female genitalia during fetal 
development (Fig, 4). These hormones are also required for sexual 
development at puberty, and for pregnancy. The ovaries also 
produce a small amount of testosterone, although much less than 
testes in males produce. 

In males the testes secrete the major androgen, testosterone. Synthesis 
of this hormone increases significantly at puberty, when it is responsible 
for adult sexual development. Androgens are also essential for the 
development of male genitalia during fetal development (Fig, 4). Some 
testosterone is converted to estrogen in males and is important for 
bone formation. 

Animals in utero can be affected by hormones produced by nearby 
siblings of the opposite sex. The placement of an animal, such as a 
mouse in a litter, may have a long-term effect on physiology or 
behavior. Female mice that develop in the uterus between two males 
have shorter fertile periods than do females that develop between two 
females. Male mice prefer to mate with the females that develop in an 
all-female environment. Females who develop between two brothers 
in utero are more aggressive towards intruders than are females who 
develop with two sisters. 

Hormones also affect mature adults. Males and females have receptors 
for estrogens, progesterone, and androgens in various tissues. 
Transsexuals (individuals who have a conflict between their biological 
sex and their perceived gender) must take hormones of the opposite 

Sex and Gender 7 



sex if they choose to undergo a sex change. Males can develop breasts, 
decrease facial hair production, and change the texture of their skin 
and hair as a result of estrogen and progesterone therapy combined 
with anti-androgen drugs. Conversely, high levels of testosterone can 
have a masculizing effect on females. Interestingly, individual 
differences in natural hormone levels and hormone sensitivity mean 
that those undergoing a sex change require individualized hormone 
treatment programs. 

Intersex 

For some individuals, determination of biological sex can be difficult. 
Intersex refers to genetically determined differences of the 
reproductive system. This can include differences in internal 
reproductive organs, external genitalia, or karyotype. Mild intersex 
conditions include, in males, a condition in which the urethra opens on 
the underside of the penis or, in females, an enlarged clitoris. Female 
intersexuals (karyotype 46XX) (also called female pseudohermaphrodites) 
have normal ovarian tissue, and have either male or ambiguous 
genitalia. This is usually a result of a change in the fetal adrenal 
glands, leading to production of abnormally high levels of androgens. 
The androgens produce some masculine features in female infants: 
ovaries and uterus form, but the external genitalia appear male-like. 
This accounts for about two-thirds of intersex states. 

Male intersexuals (karyotype 46XY) (also called male pseudohermaphrodites) 
have normal testes with female or ambiguous genitalia. They most 
often result from several different genetic alterations in pathways of 
testosterone synthesis and metabolism. For example, males who have a 
mutation in the gene that converts testosterone to 
dihydroxytestosterone have normal testes but have a very small penis 
and a vaginal pouch. In gonadal dysgenesis the testes fail to secrete 
androgens or mullerian-inhibiting hormone, leading to formation of 
female genitalia. With estrogen treatment, however, these individuals 
will grow into females. A condition called micropenis results from lack 
of androgens later in fetal life; testosterone treatment can stimulate 
masculizing puberty in these individuals. 

Androgen insensitivity syndrome (AIS) occurs when a male 
produces cells that cannot respond to androgen. The defect is in a 
gene on the X chromosome that produces the androgen receptor. 
Individuals may have complete or partial androgen insensitivity. In 
complete AIS the testes develop in the embryo, and produce 
testosterone and the hormone that inhibits development of female 
internal reproductive organs (Fig, 4). However, because the cells do 
not respond to testosterone, female genitals develop, which may be 
incomplete. The newborn appears to be a female and develops 
external female characteristics at puberty. Lacking internal female 
reproductive organs, though, the individual with AIS does not 
menstruate and is infertile. In incomplete AIS, individuals may appear 
male or female, but there may be abnormalities in the external 
genitalia. Maria Patino, a Spanish runner with complete AIS, was not 
allowed to compete in the 1985 World University Games in Kobe, 
Japan because she failed the gender test. (See the Sex and Gender 
video.) Because of such difficulties in determining sex, the 
International Olympic Committee abolished gender testing in 1999. 
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Ethics of Intersex Treatment 

Common medical treatments of intersex babies include: 

1) assignment of gender based on a variety of clinical tests 

2) surgery to remove internal gonads that might become cancerous 

3) reconstruction of external genitalia appropriate for the assigned sex 

4) if necessary, treatment with appropriate hormones 

Unless a penis is present, most intersex babies are assigned female 
because it is not possible to construct a fully functional penis. The 
gender assignment and the surgery are usually done immediately after 
birth, with possible additional surgery after puberty. Today, some 
physicians such as Eric Vilain (featured in the video) recommend 
allowing the child to make his or her own surgical decisions later in 
life. However, most parents choose the surgery earlier because they are 
uncomfortable with the ambiguity. 

The Intersex Society of North America believes that intersex is not an 
abnormality but rather "an anatomical variation from the standard 
male and female types." 5 The Society also believes that the decision 
regarding treatment, if any, should be made by the individual when he 
or she is capable of informed consent. The Society has two objections 
to treatment: 1) treatment assumes that intersexuality is a disease, and 
2) surgery often damages sexual function, while still failing to produce 
anatomically normal genitals. They claim that physicians have 
traditionally failed to communicate to parents the basis for the 
assignment of gender, which is not always (and in the case of XY 
individuals without a penis, never) made based on biological sex. They 
also claim that some physicians have failed to inform parents of 
alternatives and presented elective surgery as essential for health. 

Physicians were once guided by the idea that infants are gender 
neutral at birth, and that normal gender development would follow 
from the environment of the child based on the sex assigned to the 
child. 4 They believed that gender came from "nurture" rather than 
"nature." In recent years individuals, especially intersexuals, have 
protested that the environment does not control gender and, given 
that the treatments are irreversible, they should be delayed until the 
child determines his or her true gender. Others worry, however, that 
children with ambiguous genitals will be ridiculed, causing permanent 
psychological damage that could be avoided. 

Homosexuality 

There is no simple genetic test to differentiate homosexuals from 
heterosexuals. However, many studies of siblings have consistently 
found a much higher incidence of homosexuality in pairs of 
monozygotic (identical) twins compared to dizygotic (fraternal) twins, 
strongly suggesting a genetic component to homosexuality. Although 
there is some suggestive evidence linking variation at specific regions 
of the genome with the propensity for being homosexual, these 
studies are not yet conclusive. 

Sex and Disease 

Sex is an important aspect of human identity, but it is also important in 
health. Women outlive men. In the United States at the start of the 
twenty-first century, a woman's life expectancy at birth is 79 years, and 
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a man's is 72. While other countries have greater or lesser average life 
expectancies, female life expectancy is still greater in nearly all 
countries. In fact, for most animals that have been studied, females 
outlive males; female sperm whales outlive males by thirty years on 
average. Many factors contribute to this effect, including genes, 
hormones, and lifestyle factors. Historically, the greatest death risk for 
women has been childbirth. In developed countries, however, this risk 
has decreased markedly in the last century, significantly increasing a 
woman's lifespan. 

Males die more often than females — even before birth. Although 
there are 115 male fertilized eggs for every 100 female, the ratio for 
live births is 104 males to 100 females. Each year after birth, more 
males die than females; so, by age 100 there are only 1 1 males for 
every 100 females. With improvements in health care, the gap 
between longevity in men and women is decreasing. However, one 
troubling factor contributing to the narrowing of the gap is an 
increase of diseases in women that have been typically considered 
male diseases, especially cardiovascular disease. 

There are two aspects of the longevity gap: Why do men die young 
and why do women live so long? Hormones appear to be part of the 
answer to both of these questions. Testosterone may contribute to 
early death in males. The greatest difference in death rates between 
males and females occurs during the teen years, when males 
experience a surge in testosterone. This increase correlates to increases 
in death in males by accidents, homicide, and suicide; however, these 
behavior-related deaths continue to contribute throughout life to 
male mortality more than they do to female mortality. 

While teenage females also die from behavioral causes, the incidence is 
much lower than for males. Female teenagers also experience an 
increase in hormones; these hormones, however, generally correlate 
with increased longevity in women. The strongest evidence for the 
protective effects of female hormones is the increased risk for several 
diseases after menopause, notably cardiovascular disease and 
osteoporosis. In males of all ages, testosterone increases the levels of 
undesirable LDL cholesterol and decreases the levels of the desirable 
HDL cholesterol, increasing the risk for cardiovascular disease. In 
contrast, estrogen appears to have a beneficial effect on cholesterol 
levels. As of 2003, there is much controversy about whether estrogen 
replacement after menopause gives any significant health benefits for 
women. In fact, some studies suggest that replacement therapy may do 
more harm than good. 

Women may also enjoy advantages over men in physiology and 
metabolism, probably because of hormone differences. Women have 
lower metabolic rates than men, likely leading to less oxidative 
damage to cells. Oxidative damage results from free radicals, which 
alter DNA, RNA, and protein in cells. This may explain why oxidative 
damage is linked to diseases such as cancer, Alzheimer's, and 
atherosclerosis. In animal studies, lowering metabolism by decreasing 
calorie consumption has been shown to significantly increase lifespan. 
In addition, because they menstruate, women have less iron in their 
blood. (High levels of blood iron are associated with oxidation of LDL 
cholesterol, which contributes to cardiovascular disease.) 

Women also enjoy a genetic advantage because they have two copies 
of the X chromosome. Mutations in genes on the X chromosome 
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typically do not cause disease in females because there is a normal 
copy. Two X-linked diseases are hemophilia and muscular dystrophy. 
Because X-inactivation occurs randomly in each cell, about half of the 
cells of women heterozygous for these conditions would be normal. 
Additionally, the normally inactivate copy of the X chromosome in 
females (resulting from X-inactivation during development) may be at 
least partially restored as women age, allowing the inactive X to 
provide a good copy of a gene that was lost or altered by mutation in 
the other X chromosome. 

One area in which women do not enjoy an advantage over men is in 
autoimmune diseases. Women are more susceptible to these diseases, 
such as systemic lupus erythematosus (lupus) and rheumatoid arthritis. 
There isn't a simple explanation for this increased risk; instead, it 
appears to result from a combination of genetic, environmental, and 
hormonal effects. 

Lifestyle choices also affect longevity. Early in the twentieth century, men 
smoked more than women, a factor that is thought to account for much 
of the gender gap in longevity. As more women began to smoke, the gap 
decreased. Studies indicate that women smokers may have an increased 
risk of lung cancer because they have higher levels of an enzyme that 
produces carcinogens from tobacco smoke. In addition, middle-aged 
women smokers live no longer than do men smokers, suggesting that 
smoking eliminates any health advantage conferred by gender. 

Despite the evidence for gender-based differences in physiology, 
metabolism, disease, and response to certain drugs, women were 
excluded from most medical studies for many years. Why? It wasn't just 
sexism: the difficulty in controlling the monthly cycles of hormones, 
and the concerns about possible pregnancy simply made it easier to 
leave women out of the studies. In 2001 the Institute of Medicine 
issued the report "Exploring the Biological Contributions to Human 
Health: Does Sex Matter?" The report concluded that sex was very 
important in health, and that women should be included in all studies 
of diseases that could affect them. 

With improvements in health care and an understanding of the 
importance of nutrition and exercise, it is likely that the longevity gap 
will continue to decrease. We may eventually understand which 
components of female longevity are the result of sex and which are 
the result of gender. 
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Glossary. 



Androgen Insensitivity 
Syndrome (AIS). A condition 
resulting from an alteration in a 
gene on the X chromosome that 
normally produces the androgen 
receptor. This condition leads to a 
male (XY) with cells that cannot 
respond to androgens. Because 
the cells do not respond to 
testosterone, female genitals 
develop that may be incomplete. 

Genetic imprinting. Differential 
expression of a gene, depending 
on whether it was maternally or 
paternally inherited. 

Intersex (intersexual). 

An organism with external sexual 
characteristics that have attributes 
of both sexes. Sometimes used to 
include individuals whose 
phenotype cannot be predicted 
from sex chromosome karyotype. 

Mosaic (mosaicism). A tissue 
containing two or more genetically 
distinct cell types, or an individual 
composed of such tissues. 

Sex reversal. A discrepancy 
between an individual's sex 
chromosomes and their sexual 
phenotype. 

Single nucleotide 
polymorphism (SNP). 

Variations in the DNA sequence 
that occur when a single 
nucleotide (A, T, C, or G) in the 
genome sequence is changed. 

X-inactivation. Functional 
inactivation of one copy of the X 
chromosome in cells of females. 
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"It is a somewhat sobering thought that we know more 
about the number and position of stars in our galaxy, 
places that none of us will ever visit, than we do about 
the myriad of small animals that live in our backyard. This 
is despite the fact that these creatures eat our plants, 
sometimes bite us but most importantly contribute to the 
cycling of nutrients that sustain life." Mark Dangerfield 1 

Alarmed by the rapid deforestation of the species-rich tropical rain 
forest, prominent environmental biologists such as the Harvard 
ecologist Edward Wilson became increasingly active during the 1980s, 
warning the public about the impending crisis of species loss. In 1986 
Wilson and others convened the National Forum on Biodiversity to 
discuss various problems associated with ecosystem loss. Calling 
attention to the scope of the crisis, that forum's organizers coined a 
new word: biodiversity. 

What is Biodiversity and Why Should We 
Conserve It? 

The term "biodiversity" was derived from "biological" and "diversity," 
and refers to the total diversity of all life in a given locale — one as 
small as a backyard (or smaller) or as large as the entire planet Earth. 
One example of a biodiversity measurement is bird watchers listing the 
species they see in an area on a given day. Although it is often thought 
of as the number of species in a locale, biodiversity actually has a much 
wider definition and encompasses levels above and below that of the 
species. Wilson described biodiversity as the "totality of hereditary 
variation in life forms, across all levels of organization, from genes to 
chromosomes within individual species to the array of species 
themselves and finally at the highest level, the living communities of 
ecosystems such as forests and lakes." 2 

There is a strong and growing consensus among environmental 
biologists that we are currently in the midst of a biodiversity crisis. 
Human-induced global climate change is now accepted as fact. 
Habitats are rapidly disappearing. Species are going extinct at 
accelerating rates. 

Why should we care about preserving biodiversity? Environmental 
biologists have outlined two general reasons. First the utilitarian 
reasons: We rely on a large number of animal, plant, and fungal 
species for various purposes including food and medicine. In fact, as 
Simon Levin notes, about forty percent of "all prescription drugs in the 
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United States contain active ingredients originally derived from 
nature" 3 . Moreover, our current knowledge is probably akin to the tip 
of an iceberg compared to the potential medicinal or other benefits 
from species that remain undiscovered. This is particularly true with 
respect to microbes and fungi, which we know less about than plants 
and animals. 

In addition to the benefits from individual species, humans also benefit 
from maintaining healthy ecosystems; perturbing these ecosystems can 
adversely affect human health. For instance, Lyme disease emerged in 
the northeast United States because of changes in the forest ecosystem 
of that region. As the forests became more fragmented, population 
sizes of white-footed mice soared as they were now free from 
competitors or predators, whose populations had declined in the now 
patchy forests. The mice are a source of blood for ticks, which can carry 
the Lyme-disease bacterium. As the diversity of other small, ground- 
dwelling rodents decreased, the mice became an increasingly exclusive 
source of food for ticks, which also feed on humans and other 
mammals. This resulted in a surge in the exposure of humans to the 
bacteria. Through a series of links, forest fragmentation has permitted 
Lyme disease to rapidly become a major health problem in the eastern 
United States. (See the Microbial Diversity unit.) 

In addition to the utilitarian reasons, there are also non-utilitarian 
reasons to preserve biodiversity. Part of the beauty of nature comes 
from the copious diversity of life. Most would agree that a marked 
reduction in the Earth's biodiversity would make it a much poorer 
planet. Related to both the utilitarian and the non-utilitarian reasons is 
that biodiversity is essentially irreplaceable. The creation of new 
species by the natural process of speciation usually occurs in time spans 
of many thousands of generations, far exceeding human lifetimes. The 
biodiversity that disappears on our watch will be lost not only for our 
children and their children, but will remain lost for countless 
generations to follow. In human terms, extinction is forever. Is it moral 
for humans to cause the irrevocable loss of other species if we can 
avoid it? 

The line between utilitarian and non-utilitarian reasons for preserving 
biodiversity is blurred. Some reasons now listed as "non-utilitarian" 
may actually turn out to be utilitarian. Recent research is starting to 
give us hints that as diversity collapses, the whole ecosystems on which 
we depend may collapse on a global scale as well. The loss of diversity 
from a particular area may have a more drastic consequence than 
simply "it's not pretty anymore" — it may come to mean, "this is now a 
wasteland of biological life." 

Global Species Diversity 

Biodiversity is copious and imperiled, yet, it is difficult to measure. This 
feature makes it also difficult to quantify its loss as well: we know little 
about what we are losing. Despite its importance, knowledge about 
biodiversity lags behind that of other areas of science. The statement 
that opened this chapter echoes those made by several researchers in 
environmental biology who have been frustrated by the lack of 
progress quantifying biodiversity. As we shall see, even the simple 
question "How many species of animals are on Earth?" has not been 
answered, even to within an order of magnitude. 
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Before discussing how scientists address the question "How many 
species of animals are on the planet Earth?" let's first ask, "How many 
species of animals have been described?" There is uncertainty even to 
the second question's answer. Some uncertainty reflects differences in 
opinion among taxonomists about whether different populations are 
indeed separate species. Some is due to inadequate centralized 
databases. While there are efforts underway to provide a centralized 
catalog of described species, none exist as of 2003. The most current 
estimates are that there are about 1 .4 to 1 .6 million described species 
of animals. 

What are these 1 .6 or so million species of animals? At least one 
million are insects. A quip from JBS Haldane, polymath and one of the 
founders of the evolutionary synthesis, illustrates the taxonomic 
concentration of biodiversity. When asked about what he could divine 
from nature about the Creator, Haldane replied that he must have had 
"an inordinate fondness for beetles." Haldane's quip was in reference 
to the sheer quantity of beetle diversity. There are roughly 450,000 
different described species of beetles, representing about 30-40% of 
known insect species (Fig, 1). There are about 200,000 described 
species of flies. In contrast, there are only about 9,000 species of birds 
and 4,000 species of mammals. Every year, about 2,400 new species of 
beetles and 1,200 species of flies are described. Thus, the number of 
species of beetles scientists will describe in the next five years alone is 
greater than the total number of current bird species. 
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Figure 1, A pie chart of the 
hypothesized distribution of species 
living on earth today. 



From Purvis & Hector, Nature vol. 405 (2000) p 212. Courtesy of Nature Publishing 

In addition to animals and plants, biodiversity also includes a vast 
number of unlabeled species of bacteria, fungi, and protists. These 
contribute to environmental homeostasis by degrading organic matter 
and by making the energy in inorganic matter available for growth. 
Although we often forget these organisms in our consideration of 
biodiversity, they are critical to the balance and resilience of the 
environment, especially with respect to their role in nutrient cycles. 
(See the Microbial Diversity unit.) 
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Much of the known biodiversity is located in the tropics. In general, 
species diversity greatly increases as one moves toward the equator: 
specific hotspots of biodiversity are located in tropical rain forests. 
Even though they account for only about seven percent of the land 
area on the planet, tropical rain forests are home to around half the 
known species of animals. 

The Erwin Study 

Prior to 1982 most biologists thought that the number of undescribed 
species was roughly comparable to, or perhaps a few times as many as, 
the number already described. Thus, pre-1982 guesses of the total 
number of animal species were on the order of several million. But no 
one really knew. 

In 1982 Terry Erwin published a provocative report in which he 
estimated the number of species of insects to be not several million but 
an order of magnitude higher — several tens of millions. Erwin 
reasoned that because the tropical forests appeared to contain vast 
unexplored areas of biodiversity he would sample there. Erwin, an 
expert on beetles, fogged the canopy of several trees of the species 
Luebea seemannii with a pesticide. The fogged insects then fell to the 
ground, allowing Erwin to sample them. As he sampled the beetles, 
Erwin kept finding new undescribed species. From the canopy of a 
single species of tree (L seemannii) Erwin found more than 1,100 
species of beetles. 

How did Erwin arrive at a global estimate for the number of species 
from his "kill 'em and count 'em" experiment? He first estimated that 
160 of those species were specialized to the canopy of that particular 
species of tree. Considering that beetles represent two-fifths of 
species diversity of insects, there should be about 400 (160 x 5/2) 
species of insects specialized to the canopy of L seemannii. This 
inference assumes that beetle diversity is representative of insect 
diversity for that species. Erwin assumed that about two-thirds of the 
insect species were in the canopy and the rest were elsewhere. Based 
on that assumption, there should be 600 (400 x 3/2) species of insects 
specialized to L. seemannii. There are an estimated 50,000 species of 
trees in tropical forests. If each tree has 600 species of insects 
specialized to it, there should be 30 million species of insects in 
tropical rain forests. 

Many authors expressed criticism and reservations about Erwin's 
extrapolations and inferences. Moreover, there have been only been 
only a few similar studies, none on the same scale as Erwin's. Much of 
the criticism revolves around Erwin's initial guess that 160 of the 
species he collected were specialists. If Erwin had overestimated the 
proportion of specialists, he would be overestimating the total number 
of species. Likewise, had he underestimated the proportion of 
specialists, he would have underestimated the total. Nigel Stork noted 
that Erwin could well be vastly underestimating biodiversity given that 
he did not know how much of the diversity of beetles from the L 
seemannii he had sampled. Suppose Erwin had only sampled one third 
of the beetle diversity, all of his estimates would be three times too 
low. Could there be 80 million species of animals? 100 million? In 
actuality, two decades after Erwin's report, most biologists have revised 
their estimates for the total number of species downward toward the 
10 million range, in part due to studies suggesting that Erwin 
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overestimated the proportion of specialists. Still, nobody really knows 
how many species are on Earth. 

Another factor that adds to the uncertainty about overall global 
diversity is our lack of knowledge about smaller organisms. There may 
be hundreds of thousands or millions of mites and fungi that we have 
literally overlooked. Even less is known about microbes. There are 
about 5,000 known species of prokaryotes, but scientists estimate that 
true diversity could range between 400,000 and 4 million species. 

Seven Kinds of Rarity 

Biodiversity is not just the number of species in an area. An area that 
contained twenty species that were all relatively common would be 
more diverse than one that contained nineteen rare species and one 
common species. What do we mean when we say a species is rare? 
Should it just be based on population size? Deborah Rabinowitz 
proposed that we should consider rarity along three different axes. 
The first axis is whether the species has a high or a low population size. 
The second is whether the species has a large or small geographic 
range. The third axis is whether the species can occur in a broad range 
of habitats or whether it is restricted to a more narrow range. 
According to Rabinowitz, a species could be considered common if, 
and only if, it had a high population size, large geographic range, and 
occurred in broad range of habitats. All other species were rare. But 
they could be rare in different senses. Given that there are three binary 
criteria, there would be two to the third power, or eight, categories 
with only one being common; thus, there would be seven different 
kinds of rarity. Rabinowitz used these criteria to classify wild flower 
species in Great Britain. While thirty-six percent of the species fell into 
the "common" category, the most prevalent category comprised 
species that were widely distributed and had high population sizes, but 
were restricted in their use of habitat. One lesson from this study is 
that many species that are abundant and widespread may be subject 
to extinction if their habitat were degraded. 

What Factors Determine 
Extinction Probability? 

Other factors being equal, species that have high population sizes are 
more likely to persist than those with low population sizes. Very small 
populations are likely to go extinct just by chance in a process called 
demographic stochasticity. As an extreme case, consider a sexual 
species that has just two individuals. If both members of the pair are 
the same gender, it is doomed. Even if the pair does include a male and 
a female, the species cannot persist unless it produces offspring that 
are of both genders. The risk of demographic stochasticity leading to 
extinction is most severe for species with population sizes below about 
10 but still is a hazard up until a population size of around 50 to 100, 
especially for species with low birth rates. Compared with sexual 
species, demographic stochasticity would be less of a factor for 
asexuals like dandelions because a single individual can reproduce 
without the need for others. 

Species with population sizes that number in the hundreds to a few 
thousand, while not at risk for extinction due to demographic 
stochasticity, still face other risks. The random evolutionary force of 
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genetic drift reduces genetic variation every generation. The strength 
of genetic drift is inversely proportional to population size. Thus, 
species with lower population sizes generally have less genetic 
variation than their more numerous counterparts. Species that have 
little genetic variation are at risk of being wiped out by disease. They 
are also less able to respond to other changes such as global warming. 
Although there is some disagreement, the consensus is that species 
with populations above 5,000 are probably safe from extinction 
because of these genetic factors. 

Even species with very large population sizes can go extinct. For 
instance, a species faces extinction if its habitat is lost and it cannot 
find a suitable replacement. One striking example is that of the 
passenger pigeon. During the early 1800s the passenger pigeon 
(Fig, 2) had a population size in the billions, on the order of the 
current human population. Overexploitation by hunters and habitat 
degradation caused its numbers to rapidly dwindle. As its numbers 
decreased, the species became vulnerable to the genetic factors listed 
above and then demographic stochasticity. In September 1914, as 
World War I was beginning, the last passenger pigeon died in captivity. 
This species went from very abundant to extinct in a century. 

Keystone Species and the Diversity- 
Stability Hypothesis 

Not all species are equal with respect to their effects on other species. 
Starfish feeding in the intertidal zone clean an area free of barnacles 
and mussels. These barnacles and mussels, without predation by the 
starfish, would come to dominate the community. In a classic 1966 
study Robert Paine removed starfish from enclosures. In those 
enclosures where the starfish were removed the number of species in 
the community dropped from fifteen to eight. Paine called starfish a 
keystone species, one whose presence has a dramatic effect on 
species diversity. 

Prior to 1973 most ecologists thought that more diverse ecosystems 
would be more stable than would ones with fewer species. This 
general belief, what has become known as the diversity-stability 
hypothesis, was based on a variety of observations but not really 
tested. One such observation was that cultivated land that had 
simplified ecological communities was more subject to species 
invasions than similar areas that hadn't had human influence. In 
addition, insect outbreaks are much more common in the less diverse 
boreal forests than they are in tropical forests. 

In 1973 Robert May published a theoretical study that challenged the 
intuitive ideas that ecologists had about the diversity-stability 
hypothesis. May analyzed randomly constructed communities and 
found that communities with more species tended to be less, not more, 
stable. May's study, like more theoretical studies of the 1970s, assumed 
that population numbers of each species were at equilibria. This 
assumption was made not because it reflected reality, but because it 
made the mathematics more tractable. More recent studies have 
shown that if there is some degree of flux in the population numbers, 
the community can maintain more species than in equilibrium. This 
variability may allow different species to respond differently to the 
environment, and can result in fewer species being lost due to 



Figure 2, Once a common bird of 
eastern North America, the last 
passenger pigeon died in a zoo in 1914. 




John J. Audubon, (1829). 
Courtesy of Haley & Steele Art Gallery. 
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competitive exclusion. When theoretical ecologists relax the 
equilibrium assumption and allow for population fluxes, they have 
found results consistent with the diversity-stability hypothesis: 
communities with more species are more stable. 

Several lines of evidence now support the diversity-stability hypothesis. 
The studies conducted by David Tilman and his colleagues provide 
some of the strongest evidence for the hypothesis. In 1982 Tilman 
divided grassland fields in Minnesota's Cedar Creek National History 
Area into more than 200 plots. He and his colleagues monitored the 
species richness and community biomass (the total mass of all plants) in 
each of those fields over the next two decades. They found that 
diversity within a community is positively correlated with plant 
community stability, as defined by the extent of variation in 
community biomass. Various other studies at different scales have 
found similar results: stability increases with diversity. 4 

Mass Extinctions 

Imagine a meteor ten kilometers wide hitting Earth. The resulting 
impact would cause ferocious tidal waves and massive earthquakes. 
Sulfuric acid would be released into the air, leading to intensely acidic 
rain. Later the atmosphere would dramatically cool because of the 
dust. The impact would affect nearly all life to some extent, and almost 
certainly there would be a significant decline in biodiversity. 




Figure 3, A re-creation by NASA 
scientists of the impact made by an 
asteroid at Chicxulub, on the Yucatan 
Peninsula. This impact is thought to 
be the cause of the K/T mass 
extinction 65 million years ago. 



Re-creation of Chicxulub Impact (2001). Courtesy of NASA. 

Such a scenario is not just the plot of a Hollywood movie like 
Deep Impact. A meteor that size actually did hit Mexico's Yucatan 
peninsula sixty-five million years ago (Fig, 3). The consequences of 
the impact led to the extinction of many major groups of animals, 
most notably the dinosaurs. This mass extinction marked the end of 
the Cretaceous (K) period and the beginning of the Tertiary (T), and 
is known as the K/T extinction. 
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Although the K/T mass extinction is the best known, it was not the 
largest. That honor belongs to the mass extinction at the end of the 
Permian period, 250 million years ago. It is often exceedingly difficult 
to distinguish species in the fossil record, so paleontologists studying 
extinction usually examine the disappearance of larger taxa (like 
genera or families). At the end-Permian extinction, sixty percent of 
families went extinct. Based on the family extinction data, David Raup 
extrapolated that up to ninety-six percent of species went extinct at 
this time. Most paleontologists recognize three other mass extinctions, 
for a total of five (Fig, 4). 




Figure 4, The graph shows an 
approximate time line of loss of 
families of species from the earth 
during the five so-called "mass 
extinctions." Below, the trilobite was 
a victim of the extinction at the end 
of the Permian period, and the familiar 
Tyrannosaurus rex died out 
with the K/T extinction 
65 million years ago. 




Carl Buell (2003). Courtesy of the artist. 



Although these mass extinctions happened during a short period by 
geological scales, they were not instantaneous. In fact, the extinctions 
probably actually occurred over a period of a few million years. 

What were the causes of the mass extinctions? We know the most 
about the asteroid-caused K/T extinction. Based on changes in the 
floral composition around the K/T boundary, some paleobotanists 
have speculated that there was global cooling after the 
extraterrestrial impact. Oceanic cooling may have led to the 
disappearance of reef-building organisms. We know less about the 
other extinctions, but it likely that they were marked by periods of 
global climate change as well. 

Species extinctions during mass extinction events account for only a 
few percent of total extinctions. Indeed, some paleontologists have 
wondered whether there is anything special about mass extinction 
events. Species extinctions occur often but at different rates across 
time. Perhaps mass extinctions are merely the tail-end of the 
distribution of extinction rates. 
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The Sixth Mass Extinction 

Should we consider, as some environmental scientists have, that the 
current biodiversity crisis is the start of a sixth mass extinction? 
Regardless of how one answers that question, it is clear that we are 
losing species at rates that, while exceedingly difficult to calculate, are 
above the background extinction rate and far exceed the speciation 
rate. Estimates are that 100,000-500,000 species of insects will go 
extinct in the next 300 years. The higher end of that estimate is 
comparable to the magnitude of the loss of species during the previous 
mass extinction episodes. Even the lower estimate represents a 
considerable loss of biodiversity. Moreover, 300 years is much shorter 
than the duration of those mass extinction periods. 

The current biodiversity crisis stems from several causes; the two major 
contributors are habitat destruction and global climate change, both 
of which are largely due to human activity. As discussed earlier, much 
of the (largely unexplored) biodiversity lies in the tropics and, in 
particular, tropical rain forests. Tropical forests are being lost at an 
alarming rate. Conservative estimates place the loss of rain forest 
during the 1980s and 1990s at about 0.8% per year. This is in large part 
due to changes in the way the land has been used. For quite a long 
time, many areas had practiced slash and burn agriculture. In recent 
decades, however, the practice of cutting and clearing has been used 
increasingly for grazing or timber harvest, resulting in the loss of the 
tropical forest habitat. As a consequence, countless thousands of 
species (most of which are unknown to humans) are imperiled. 

Global climate change has also impacted biodiversity. During the 
twentieth century, the mean temperature has increased by slightly 
more than one degree Fahrenheit (0.6 degree Celsius), and most of 
that change occurred between 1970 and 2000. Projections vary 
between x and y degrees Fahrenheit increase by mid-century. These 
changes do not appear great in the context of daily and seasonal 
temperature fluctuations, but they are large in comparison with 
prehistoric climate changes. While the magnitude of these changes is 
not beyond the range of historical variation, the rate at which the 
change has taken place appears to be so. The climate change is human 
induced, due mainly to increases in carbon dioxide and other 
"greenhouse gases" that have appeared since the Industrial Revolution 
and accelerated during the twentieth century. 

The human-induced global climate change is coupled with other 
climate cycles of various temporal and spatial scales. For example, the 
eastern United States had a cold winter in 2002-3 after several mild 
winters. In contrast, the western United States had a milder than 
normal winter that year. The pattern in 2002-3, most likely due to El 
Nino, does not invalidate the global upward climb in temperatures 
over a decades-long timespan. In addition to a mean increase in 
temperature, human-induced global warming is also likely to cause 
increased variation in climate. Some climate models suggest that the 
global warming may actually cause the northeastern United States to 
be cooler. The reason for this seemingly paradoxical possibility is that 
warming of the oceans could cause the Gulf Stream to be diverted 
south and east. Were this to happen, it would cause the Atlantic coast 
to be cooler. Regardless of the specifics of the local changes, more 
extreme weather will likely exacerbate already fragile ecosystems. 
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A paper published by Terry Root and her colleagues in 2003 shows that 
many species have altered their geographic ranges, presumably as a 
result of global climate change. 5 Of those species that had altered 
their range, eighty percent were in the direction predicted by climate 
change models. The mean change of movement was about six 
kilometers per decade. In addition, many bird species have started 
laying eggs earlier in the spring. This study shows that forces of small, 
sustained change can be powerful over long enough time scale. But 
what about species that are unable to move? What will happen as 
their habitat changes due to human-induced global climate change? 

Because of human-induced climate change and habitat destruction, we 
face a grave and growing crisis. Biodiversity is being lost at alarming 
but unknown rates. Moreover, if the diversity-stability hypothesis is 
true, loss of some species may trigger the loss of others, leading to a 
vicious circle. Although our knowledge about biodiversity and the 
extent to which it is lost is too meager, the consequences are too grave 
to continue in ignorance. 
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Biodiversity. The total diversity 
of all life in a given locale. 

Demographic stochasticity. 

Variation in numbers or genders 
of offspring via chance. When 
population sizes are low, these 
chance factors can lead to 
extinction. 

Diversity-stability hypothesis. 

Communities that contain more 
species will vary less through time 
in response to various 
disturbances. 

Keystone species. A species 
whose presence has a dramatic 
effect on the persistence of 
other species. 
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Genetically Modified 

Organisms 

"And God said... let them have dominion over the fish of 
the sea, and over the fowl of the air, and over the cattle, 
and over all the earth, and over every creeping thing 
that creep eth upon the earth. " Genesis 1:26 The Holy Bible 

"The Earth does not belong to us. We belong to the Earth. " 

Chief Seattle 
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Introduction 

Humans purposefully manipulate the evolution of other organisms. For 
thousands of years farmers have used selective breeding to improve 
their livestock and crops. As a result, we have cows that produce more 
milk, hens that lay more eggs, sheep with better wool, and 
disease-resistant plants with higher productivity. Another striking 
example of humans altering other organisms is the great diversity of 
dog breeds, from the toy poodle to the Great Dane. 

Although humans have been manipulating organisms for millennia, 
genetic engineering simplifies and targets manipulations in an 
unprecedented way. Transgenic plants and animals are generated with 
characteristics that cannot be obtained using traditional breeding. 
Unlike organisms generated by selective breeding, transgenic 
organisms (also known as "recombinant organisms") by definition 
contain genes from other species. Genetic engineering techniques are 
used to generate recombinant DNA, which contain sequences from 
different organisms. This DNA then becomes incorporated into a host 
so that it can be passed to subsequent generations. For example, Bt 
corn expresses a gene for an insecticidal toxin that was "donated" by 
the bacterium Bacillus thuringiensis. 

The use of recombinant organisms has become commonplace. For 
example, bacteria produce human insulin and hepatitis vaccines, and 
some crop plants are cultivated to be resistant to certain herbicides 
and insects. There are also transgenic livestock that produce human 
proteins, such as antithrombin III. The economic value of such products 
drives research. How did it start? Where is it going? What are the 
challenges and the risks? In this unit we will explore these questions as 
they relate to modifying bacteria, plants, and animals. 




Figure 1. The ancestor of modern 
corn had tiny kernels, each protected 
by a tough husk. Domestication of 
maize, which began thousands of years 
ago, selected for large sheathed cobs 
containing large kernels without husks. 



Genetic Modification of Bacteria 

Bacteria — the first organisms to be genetically engineered — are used 
for replicating and altering genes that are subsequently introduced 
into plants or animals. Bacterial systems lend themselves to genetic 
manipulation in part because of their rapid reproduction rates. It is 
easy to produce a genetically identical population — a clone of 
bacteria — all containing the gene of interest in a short period. The 
cells can then be lysed and DNA can be isolated in short order. Bacteria 
are routinely used to produce non-bacterial proteins. An example is 
the production of purified proteins for vaccine use. Such proteins can 
be safer and as effective as vaccines that contain killed or attenuated 
(weakened) pathogens. Genetic engineering can also produce 
extensive changes in the bacterium's metabolism. For example, 
bacteria can be provided with several genes encoding enzymes that 
allow the production of fuel alcohol from wood. 

Researchers have taken advantage of nature to modify bacteria. 
Plasmids are small, circular, self-replicating, extrachromosomal pieces 
of DNA that occur naturally. A plasmid can encode a protein that offers 
its host a selective advantage. For example, a plasmid that encodes an 
antibiotic allows its host bacterium to thwart competing microbes. 
Alternately, a bacterium might possess a plasmid that encodes 
antibiotic resistance. Plasmids are readily isolated from bacterial cells 
and can be altered in vitro by inserting or deleting specific sequences 
of DNA. Because they can be used to create clones of genes, plasmids 
are called cloning vectors. 

Getting the Plasmid In 

In nature bacteria have various enzymes that cut up the DNA of their 
natural enemies, such as bacteriophages (bacterial viruses). Researchers 
have taken advantage of these so-called restriction enzymes to 
splice DNA for use in engineering bacteria. Hundreds of restriction 
enzymes have been isolated and each will cut a DNA strand at a 
specific sequence of nucleotides. Some restriction enzymes generate 
blunt ends, cutting across both strands of DNA. Others generate a 
staggered cut, producing "sticky ends." These ends anneal by 
hydrogen bonding to similar ends on another DNA segment cut with 
the same restriction enzyme. 

Cloning a gene involves identifying a gene of interest in an organism, 
isolating DNA from that organism, and then using a restriction enzyme 
to snip the gene from the DNA strand. The gene-containing segment 
can then be spliced into a plasmid cut by the same restriction enzyme. 
The bacteria take up the plasmid and are allowed to replicate. 

Ordinarily, bacterial cells do not readily take up plasmids. Researchers 
can use various tricks, however, to get cells more ready to do so. One 
common method holds the cells on ice in a solution of calcium 
chloride. The cells are then briefly heat shocked so the plasmid can 
cross the plasma membrane. An alternate method, electroporation, 
uses a short electrical pulse to open pores in the plasma membrane, 
allowing the plasmid to pass through. 

Marker genes, such as genes for antibiotic resistance, are often 
engineered into plasmids. These marker genes enable researchers to 
know which bacteria have the plasmids. The antibiotic is added to the 
media used to grow the bacteria. Cells that do not contain the plasmid 
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will fail to reproduce. In addition to marker genes, plasmids typically 
contain one or more genes of interest. For example, a protein not 
otherwise expressed by the recipient cell might be produced only when 
the plasmid is present. Individual colonies of bacteria, each derived 
from a single cell, can be evaluated for the expression of such novel 
gene products. 

Protein production can be straightforward if the source of the novel 
gene was another bacterium. However, the goal of modifying bacteria 
might be the production of proteins encoded by eukaryotic genes from 
fungi, plants, or animals. This presents challenges. Eukaryotic DNA 
contains both exons (coding sequences) and introns (intervening 
sequences). In eukaryotic cells this DNA is used as a template for the 
production of mRNA, which must then undergo mRNA splicing. 
Introns are removed and exons are joined to form the mRNA, which 
travels to the ribosome for protein production. Bacteria lack the 
enzymes necessary for mRNA splicing, so introducing a eukaryotic gene 
into bacteria requires a special procedure. First, DNA must be 
generated that is complementary to the already spliced mRNA. The 
enzyme reverse transcriptase is then used to generate a double- 
stranded DNA molecule called cDNA, using the mRNA as a template. 
Finally, this cDNA is incorporated into the cloning vector. 

Expressing eukaryotic genes in bacteria presents other problems. After 
proteins are assembled in eukaryotic cells they are often modified. (See 
the Proteomics unit.) For example, various sugars may be attached to 
the polypeptide so that glycoproteins are formed. Bacteria are 
generally unable to accomplish such post-translational modifications, 
and eukaryotic genes expressed in bacteria may not function properly. 
The inability of bacteria to perform such modifications has driven 
scientists to use yeast (Saccharomyces cerevisiae) and eukaryotic cell 
culture to produce some recombinant products. 

Are Recombinant Bacteria Safe? 

Concerns about the safety of recombinant bacteria were voiced as the 
technology was developed. Some fear that new, untreatable human 
pathogens could be inadvertently generated. In 1974 prominent 
researchers self-imposed a moratorium on certain experiments until 
they could assess the hazards. After much discussion, the researchers 
developed biological containment procedures. These include 
generating recombinant DNA only in bacteria that have mutations to 
prevent them from surviving outside of the laboratory. The release of 
recombinant microbes into the environment remains controversial. 

Genetic Modification of Plants 

New traits introduced to crop plants by genetic engineering have the 
potential to increase crop yields, improve agricultural practices, or add 
nutritional quality to products. For example, transgenic crop plants 
capable of degrading weed killers allow farmers to spray weeds 
without affecting yield. Use of herbicide-tolerant crops may also allow 
farmers to move away from preemergent herbicides and reduce 
tillage, thereby decreasing soil erosion and water loss. Transgenic 
plants that express insecticidal toxins resist attacks from insects. Crops 
engineered to resist insects are an alternative to sprays, which may not 
reach all parts of the plant. They are also cost effective, reducing the 
use of synthetic insecticides. Genetic engineering has also been used to 
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increase the nutritional value of food; "golden rice" is engineered to 
produce beta-carotene, for example. Edible vaccines, present in the 
plants we eat, may be on the horizon. 

The new traits expressed in such transgenic plants are derived from a 
variety of other organisms. Scientists have given a gene from the 
bacterium Salmonella to cultivars of soybeans, corn, canola, and cotton 
to degrade the pesticide glyphosphate (Roundup™). The gene for the 
insecticidal toxin in transgenic cotton, potato, and corn plants comes 
from the bacterium Bacillus thuringiensis (Bt). One of the genes 
allowing vitamin A production in golden rice is derived from the 
bacterium Erwinia uredovora; others are from the daffodil. 

The development of golden rice involved the introduction of several 
genes into a plant to provide a multistep biochemical pathway. 
(Fig, 2) Rice grain, which serves as a food staple for much of the 
world, lacks vitamin A. An estimated 100 million to 200 million children 
worldwide have vitamin A deficiency, a condition that causes 
blindness; and increases susceptibility to diarrhea, respiratory infection, 
and childhood diseases such as measles. Beta-carotene and other 
carotenes (the red, yellow, and orange pigments found in carrots and 
other vegetables) are the precursors of vitamin A. Rice synthesizes 
beta-carotene in its chloroplasts but not in the edible seed tissue. 

Ingo Potrykus and his colleagues found that geranyl geranyl 
diphosphate (GGPP), a precursor to carotenoid production, is present 
in rice seed. They genetically engineered golden rice to express the 
enzymes necessary for the conversion of GGPP to beta-carotene. The 
synthesis of beta-carotene from geranyl geranyl diphosphate requires 
four biochemical reactions, each catalyzed by a different enzyme. A 
bacterium, Agrobacterium tumefaciens, containing three plasmids, was 
used to introduce all the genes necessary for the complete biochemical 
pathway for beta-carotene production. It was possible to use three 
enzymes instead of four because the bacterial enzyme phytoene 
desaturase accomplishes what two plant enzymes (phytoene 
desaturase and beta-carotene desaturase) do. 
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If transgenic plants can help prevent vitamin deficiencies, can they also 
produce vaccines? Edible vaccines available in crops could help people 
in developing nations where transportation, refrigeration, and 
disposable needle supplies are limited. Hugh Mason and his colleagues 
(Boyce Thompson Institute) have expressed a gene that encodes an 
E. coli protein in potatoes. Volunteers who ate raw, modified potatoes 
developed antibodies to the protein. Research is underway to see 
whether the antibodies will protect against diarrhea induced by 
disease-causing E. coli. 

Techniques Used for Generating 
Transgenic Plants 

As with bacteria, the ability to genetically modify plants depends on 
obtaining genetically identical populations and readily manipulating 
DNA. How do you "clone" a plant? Many plant species naturally 
undergo asexual reproduction by fragmentation, where segments 
from a parent plant regenerate a new plant. It is also possible to grow 
plants in culture from small explants. Another method is to culture 
plants from totipotent cells found in plant meristems. These plant 
cells can divide and differentiate into the various types of specialized 
cells. In a test tube, plant cells will divide and form an undifferentiated 
callus. When hormones in the culture medium are adjusted, the callus 
will sprout shoots and roots and eventually develop into a plantlet that 
can be transplanted to soil. To clone a plant — perhaps a plant with 
new genes — the growing callus is simply subdivided. Thousands of 
genetically identical plants can be generated in this way. 

How do you get a plant to take up a gene? Researchers working with 
rice often use the soil bacterium Agrobacterium tumefaciens. This 
bacterium, the cause of crown gall disease in many fruit plants, is well 
known for its ability to infect plants with a tumor-inducing (Ti) 
plasmid. A section of the Ti plasmid, called T-DNA, integrates into 
chromosomes of the plant. Recombinant DNA can be added to the 
T-DNA, the gall-inducing genes removed, and infection by the 
bacteria — containing the recombinant plasmid — will provide for 
transfer of novel genes to plant embryos. 

Although Agrobacterium turn e fa ciens works for introducing plasmids 
into rice, not all plants are equally susceptible to this bacterium. 
Researchers interested in modifying crops such as wheat and corn 
have turned to other methods for delivering genes to plant cells. One 
approach is to use a "gene gun," (Fig, 3) which fires plastic bullets 
filled with DNA-coated metallic pellets. An explosive blast or burst of 
gas propels the bullet toward a stop plate. The DNA-coated pellets are 
directed through an aperture in the stop plate, and then penetrate 
the walls and membranes of their cellular targets. Some projectiles 
penetrate the nuclei of cells, where occasionally the introduced DNA 
integrates into the DNA of the plant genome. Transformed cells can 
then be cloned in culture. 

Marker genes are often included in DNA constructs so that plants that 
have acquired the novel DNA can be selected. In plants, marker genes 
include those for herbicide resistance. Plants that grow in the 
presence of the herbicide are assumed to possess the transgene of 
interest. The transgenic plant embryos are cultivated in tissue culture. 
Once mature plants are obtained they are evaluated for the activity of 




Figure 3. 

A "gene gun. 
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the introduced gene, any unintended effect on plant growth, and 
product yield and quality. The ability of the gene to be expressed in 
subsequent plant generations is also evaluated. 

Not all genes are expressed in every tissue of a plant. When golden 
rice was developed it was necessary to ensure that the novel genes 
were expressed in the endosperm of the seed. The endosperm of a 
seed is the starchy component that provides energy and nutrients for 
the developing plant embryo. Regulatory DNA sequences upstream 
from the specified genes were introduced into the recombinant Ti 
plasmids. Such regulatory regions influence where and when a gene 
will be expressed. (See the Genomics unit.) The regulatory regions 
chosen for golden rice provide an uninhibited transcription of the 
genes in endosperm. 

Problems and Concerns 

Several concerns have been raised regarding transgenic crop plants. 
Foremost is the possibility that the process of genetic engineering 
might inadvertently generate new allergens or toxins that could affect 
human health. Another concern is that introduced genes from 
engineered crops could move to other organisms in the environment. 
Other concerns relate to cultivars that are engineered to produce 
insecticides. The potential development of insecticide resistance in 
target pests is worrisome; as is the possibility that non-target, 
beneficial insects might be affected by engineered plants. 

A particular concern is the possibility that transgenic crop plants could 
affect human health by expressing unanticipated allergens. In March 
1996 researchers at the University of Nebraska showed that an allergen 
from Brazil nuts had been transferred into soybeans. Individuals 
sensitized to Brazil nuts make antibodies (IgE) specific to certain 
proteins in the nuts. Engineered soybeans reacted with such antibodies 
in vitro. Had allergic individuals consumed the transgenic soybeans 
they would have likely experienced IgE-mediated reactions, ranging 
from itching to anaphylaxis. 

Obviously, expressing a known allergen in food crops is unwise. 
However, it is difficult to predict whether a protein expressed in a 
novel organism will cause allergies. A protein isolated from its native 
species may differ from the same protein (with an identical amino acid 
sequence) harvested from a transgenic organism expressing that 
protein. Sometimes sugar or acetyl groups are added to proteins after 
they are manufactured at the ribosome. The forms of sugar or acetyl 
groups may vary between organisms. Sugar groups on proteins have 
been associated with allergenic and immunogenic responses. Hence, 
allergenicity studies ought to be carried out on the actual material 
derived from transgenic plants themselves, rather than on just the 
bacterial proteins. Such studies are not always done. 

Critics are worried that engineered plants might generate toxins as a 
result of the DNA-insertion process. They note that the insertion of 
genetic material (using gene gun technology, for example) is semi- 
random, and that the amount and location of DNA inserted into the 
chromosome varies. If an insert disrupts a regulatory region that serves 
to "turn off" the production of a toxin, the result might be an over- 
expression of toxin. Another concern is the inclusion of regulatory 
regions as part of genetic constructs: the regulation of host genes near 
an insert could be dramatically affected. 
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Significant concerns relate to the impact that genetically modified 
plants could make on the environment. In experiments, transgenic crops 
are known to hybridize with closely related species. The probability that 
transgenic traits, as well as other accompanying changes in traits, will 
show up in wild plant relatives is increasing as genetically modified 
crops are established. Herbicide-tolerant weeds can evolve; 
glyphosphate- (Roundup™) resistant rigid ryegrass, for example, has 
developed only recently. Genetically modified crops must be monitored 
to reduce unintended degradation of natural ecosystems. 

Crops engineered to produce insecticides, such as Bt toxin, bring other 
concerns. The widespread planting of Bt corn and other crops can 
result in insects evolving a resistance to Bt toxins. At least ten species 
of moths, two species of beetles, and four species of flies already have 
developed resistance to Bt toxins under laboratory exposure. Bt toxins 
administered as a spray are present only transiently. However, 
transgenic crops continuously express the insecticidal protein. This 
ongoing exposure may be more likely to select for resistant insects. 

The emergence of a resistant insect population is likely whenever a 
pesticide is used. One strategy for delaying the emergence of insects 
resistant to Bt toxin is to plant a "refuge" of conventional crops near 
Bt-expressing crops. The idea is that these conventional crops will 
harbor susceptible insects that will mate with resistant insects, diluting 
out recessive resistance alleles. Of course, if resistance develops as a 
dominant allele, this strategy will not work. 

There are hundreds of known subspecies of Bacillus thuringiensis, and 
the insecticidal toxin derived from each is poisonous only to certain 
species of insects. Nevertheless, there are concerns that plants 
expressing genes for such toxins could affect non-target insect species. 
Some of these species may be beneficial, such as those that provide 
pollination or consume pests. Laboratory experiments suggest an 
increased mortality of Monarch butterflies that ingested Bt corn 
pollen. How frequently this occurs in the field is unknown, and not all 
laboratory studies have given similar results. The Environmental 
Protection Agency (EPA) requires toxicity tests on a standard set of 
organisms before a pesticide can be registered. As of December 2002 
the EPA had not demonstrated toxicity of Bt to non-target species. 
Data gathering continues. 

Genetic Modification of Animals 

Dolly the lamb stole the headlines as the first example of livestock 
cloned from DNA of an adult animal. But the real breakthrough came 
with Polly, the first transgenic lamb. Born the year after Dolly, Polly 
was given a human gene that encodes blood-clotting factor IX, the 
protein missing in people with one form of hemophilia. Harvesting 
such proteins from transgenic livestock is one goal of this research. The 
road to Polly and subsequent transgenic animals began with research 
using genetically altered mice. Along the way, technologies for cloning 
animals, modifying DNA, and targeting expression of proteins to 
specific tissues were developed. Someday, human gene therapy — 
supplying genes to patients with missing or altered proteins — may 
become common practice. However, significant challenges remain. 
Moreover, risks and ethical concerns must be addressed. 
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Antithrombin III (AT-III) is an example of a pharmaceutical produced in 
transgenic livestock. A normal level of AT-III keeps the formation of 
blood clots under control. Patients with AT-III deficiency may have 
thromboembolic problems beginning in early adulthood, particularly 
clots in the legs and pulmonary embolism. Providing therapeutic AT-III 
can reduce clotting risks in such patients. Other therapeutic proteins 
being considered for production by transgenic animals include human 
hemoglobin, human serum albumin, tissue plasminogen activator 
(used to treat stroke), human alpha-1-antitrypsin (alpha-1-antitrypsin 
deficiency can cause life-threatening emphysema), various vaccine 
proteins, and monoclonal antibodies. 

For some time, mice have been genetically altered to exhibit human 
genetic disease. To generate such animal models normal genes in mice 
are inactivated using "knockout" technology, or altered by 
replacement of the normal gene with a mutated counterpart. Mouse 
disease models now exist for cystic fibrosis, beta-thalassaemia, 
atherosclerosis, retinoblastoma, and Duchenne muscular dystrophy. 
Such animal models allow researchers to test therapeutic compounds 
and study the molecular basis of given diseases. 

Knockout technology, as well as other genetic engineering approaches, 
depends on the ability to target genes for insertion into particular 
locations within the host chromosome. To do this, a region on the 
chromosome is identified and DNA homologous to that region is 
engineered into a cloning vector. The newly inserted sequence can then 
be disrupted by insertion of a selected gene; for example, a marker 
gene encoding antibiotic resistance. Once cells take up the DNA, 
homologous recombination on either side of the marker gene allows it 
to be precisely inserted into the chromosome. At the same time, some 
or all of the target gene on the chromosome is deleted. (Fig, 4) 

Gene knockout in pigs is being studied as an avenue for transplanting 
animal organs into humans. A major cause of tissue rejection is an 
immune reaction to the carbohydrate galactose-a-1,3,-galactose on the 
surface of non-human cells. Deletion of the a-1,3,-galatosyltransferase 
gene may allow the production of animals lacking this surface marker. 

As researchers recognized the potential of transgenic livestock for the 
production of human therapeutics and transplant tissue, farmers 
recognized the contributions that genetic engineering might make to 
the economics of livestock production. Cows might be produced that 
could grow more muscle mass, require less feed, produce more milk, or 
be leaner. The composition of milk could be changed; for example, 
casein could be over-expressed to provide increased cheese production. 
Lactose might be removed from milk for lactose-intolerant consumers. 
Disease resistant animals could reduce the use of antibiotics. Poultry 
with less fat content and eggs with lower cholesterol are other goals. 

Cloning Animals 

Asexual reproduction in bacteria and plants allows scientists to obtain 
genetically identical populations; this does not occur naturally in 
vertebrates, except in twins. In 1996 Dolly the lamb was born: 
chromosomal material derived from an adult sheep was used to 
generate an animal with chromosomal DNA identical to that of the 
donor animal. Cloning livestock, using the techniques that generated 
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Figure 4. The plasmid contains a gene 
interrupted by a marker gene (XR). 
Recombination involving two crossovers 
between the plasmid and wild type 
chromosomal DNA with the interrupted 
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Dolly, may become an economical method for traditional breeders to 
replicate their superior animals and provide them to farmers. Rather 
than selling semen, breeding companies might distribute cloned 
embryos for implantation into surrogate cows. Because Dolly did not 
possess foreign DNA she was not transgenic. However, she did 
represent a valuable step toward the development of transgenic 
livestock. With donor DNA for cloning derived from cultured 
recombinant cells, it becomes possible to carry out specific genetic 
modifications and introduce the modified genes into animals. 
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Figure 5. A donor cell is fused with 
enucleated egg cell by subjecting the 
two cells to pulses of electricity. The cell 
replicates in culture, generating an 
embryo, which is then introduced into 
the uterus of a female for development. 



Nuclear Transfer 

Ian Wilmut and his colleagues cloned Dolly using a technique called 
nuclear transfer 7 . In this technique, the nucleus of a recipient egg is 
removed to make way for the genetic material of the donor. (Fig, 5) 
The donor cell is fused with the enucleated egg cell by subjecting the 
two cells to pulses of electricity. Earlier studies had suggested that 
donor nuclei from early embryos were more likely develop properly. 
The use of an adult cell for the donor nucleus was unique in Dolly's 
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case. Although most differentiated animal cells contain all the genes 
for making an entire organism, nuclei change as cells differentiate. To 
dedifferentiate the udder cells used for nuclear transfer they cultured 
the cells in a nutrient-poor medium. This caused the cell cycle to stop in 
the GO phase. After fusion, 277 embryos were grown in culture for six 
days before implanting them in thirteen surrogate mothers. Only one 
of the embryos completed normal development. 

Cloning by nuclear transfer depends on the availability of donor cells 
with the appropriate genetic information. Somatic cells such as 
fibroblasts, ovarian cells, muscle cells, and mammary epithelia are 
grown in cell culture and genetically modified by fusion with the 
enucleated egg. Commonly, DNA is transferred to the cells using viruses. 

Microinjection and Other Techniques 

Another technique for cloning animals is microinjection. In this 
technique, a gene construct is characterized in culture and an 
adequate quantity of the desired DNA is obtained. The DNA is injected 
into fertilized ova before the first cell division occurs. This increases the 
probability that all of the cells of the organisms will harbor the gene. 
The injection is done soon after fertilization, before the male and 
female pronuclei have fused. A very thin pipette or needle injects the 
DNA into the large male pronucleus. (Fig, 6) Surrogate mothers are 
made pseudo-pregnant with hormones and implanted with the 
injected eggs. After birth, tissue samples of the young are assessed for 
the presence of the desired gene. DNA from germ line cells is given 
special attention. If the novel gene is present in these cells, the animal 
can be used as a founder for breeding. 




Figure 6. Microinjection. 



Genetic constructs that include regulatory regions targeting gene 
expression to specific tissues are necessary if the gene product is to be 
harvested readily. For example, GTC Biotherapeutics uses the 
betacasein promoter to ensure that antithrombin III is secreted in goat 
milk. Common biochemical procedures, such as filtration and 
chromatography, are then used to isolate the AT-III from the milk. 
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Scientists often use southern blots to evaluate DNA extracts from 
tissue samples. (Fig, 7) Southern blotting is a type of nucleic acid 
hybridization test in which single-stranded DNA from two sources 
interact. Strands with similar nucleic acid sequences will anneal by base 
pairing (A with T, and G with C) to form double-stranded molecules. 
One of the single-stranded DNA molecules is a unique portion of the 
gene of interest, and is radiolabeled so it can be detected on 
photographic film (the probe). Southern blotting allows the detection 
of fragments of genomic DNA, which anneal to the radiolabeled 
probe. The fragments are generated using restriction enzymes and 
separated in a gel by electrophoresis. The size of a given fragment 
relates to the distance it migrates on electrophoresis. The fragments 
are denatured to single strands, transferred to a special filter paper 
that is immersed in a solution containing the probe, and then rinsed. If 
the probe has annealed it will expose the photographic film, resulting 
in a band. 
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Challenges 

Even beyond the controversies involving human cloning, there are 
risks and ethical dilemmas surrounding the use of transgenic and 
cloned animals. 

One risk from cloning animals is a loss of genetic diversity in livestock. 
This could result in increased susceptibility to disease or other 
environmental challenges. Some of this risk might be avoided, 
according to the Roslin Institute, by systems that limit the number of 
clones produced by breeders and restricting the number of clones sold 
to any given farmer. 

The overexpression or deletion of certain genes must also be evaluated 
from an animal welfare perspective. The secretion of proteins in the 
milk of transgenic goats seems to have no ill effects. However, pigs 
that harbor foreign genes have exhibited many problems including 
lameness, lethargy, thickened skin, kidney dysfunction, inflamed joints, 
peptic ulcers, pericarditis, severe osteoarthritis, and a propensity 
toward pneumonia. 

The safety of cloning techniques has been questioned by a number of 
researchers. Rudolf Jaenisch (MIT) published a study in September 2002 
comparing 10,000 genes from placentas and livers of newborn cloned 
mice with those from normal mice; at least four percent were 
functioning incorrectly. Cloned mice have exhibited developmental 



Figure 7, DNA fragments are 
generated using restriction enzymes. 
The fragments are separated in a gel 
by the application of an electric 
charge. The fragments are then 
blotted onto a piece of nitrocellulose 
paper, where they retain their same 
pattern of separation but are 
denatured to become single-stranded 
DNA. A unique single-stranded 
portion of the gene of interest (the 
probe) is radioactively labeled and 
allowed to anneal with the blotted 
paper. When exposed to a sheet of 
photographic film, any DNA fragments 
that annealed with the labeled probe 
are identified. 



Genetically Modified Organisms 



11 



abnormalities, obesity, pneumonia, liver failure, and premature death. 
Dolly exhibited arthritis at an unusually young age and was put to 
sleep at age six, about half the life expectancy of sheep in captivity. 

An additional concern about the use of transgenic animal products, 
including transplanted organs, is the risk of human exposure to animal 
pathogens. At least 150 pathogens are known to infect both humans 
and some other animal. In 1997 the isolation of two retroviruses from 
pigs that could infect human tissue culture cells was reported. These so 
called PERVs (porcine endogenous retroviruses) are of special concern 
to those considering the use of porcine tissue for transplants, especially 
because some retroviruses have been associated with cancer. 

Addressing the Controversies 

Decisions made regarding the use of genetically modified organisms 
will impact the environment, and force a reexamination of consumer 
safety and animal welfare issues. Do the benefits provided by 
transgenic organisms outweigh the risks? Are those making decisions 
influenced too heavily by the profit motive? How can opportunities for 
competing approaches be ensured? 

Certainly the production of genetically altered organisms is a profit- 
making business. In 1980, individuals and companies realizing this 
sought protection of their intellectual property and turned to the 
courts. That year the U.S. Supreme Court delivered a landmark decision 
stating that living organisms are patentable; in 1988 a patent was 
issued for the genetically altered "Harvard mouse." 

In late 2001 seventy-seven scientists and teachers from sixteen 
countries, concerned with how environmental protection decisions are 
made, issued the Lowell Statement on Science and Precaution. Their 
"Precautionary Principle" recommends using the safest approaches to 
meeting society's needs, placing responsibility for finding the safest 
alternatives in the hands of those originating potentially dangerous 
activities, use of independent review, and participation of those who 
may be affected by a policy choice. These guidelines might well be 
extended beyond environmental policy. 

Governmental bodies often play the role of reviewer when it comes to 
safety, particularly of foods. Various governments and organizations 
have begun generating guidelines and recommendations regarding 
foods derived from transgenic organisms. For example, the Food and 
Agricultural Organization of the United Nations along with the World 
Health Organization organized a series of scientific consultations to 
provide their member nations with recommendations. In a January 

2001 report the consultation agreed that "the safety assessment of 
foods derived from biotechnology requires an integrated and stepwise, 
case-by-case approach." 2 

Can the population at large — by consumer and political choice — 
influence the use of genetically modified organisms? In November 

2002 Oregon was the first state in the U.S. to put labeling of 
genetically modified foods on the ballot. Proponents of labeling spent 
about 200,000 dollars to convince voters. Opponents, with funding 
from large agribusinesses, spent 5.5 million dollars to kill the idea. 
Voters were convinced that labeling would significantly increase food 
costs and rejected the measure. 
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Glossary. 



Bt. The bacterium Bacillus 
thuringiensis; also refers to the 
crystalline insecticidal protein 
produced by the bacterium. Bt 
crops, such as Bt-corn, are 
transgenic plants that express the 
insecticidal protein. 

Clone. Two or more genetically 
identical progeny. Clones can be 
of genes, cells, or whole 
organisms. 

Cloning vector. A carrier of DNA 
that can replicate; usually a 
plasmid, bacteriophage, or 
eukaryotic virus. 

Electroporation. Use of electric 
shock to make cell membranes 
temporarily more permeable to 
molecules such as DNA. 

Exon. The sequence of a gene 
that encodes a protein. Exons may 
be separated by introns. 

Gene gun. A device that delivers 
DNA to cells by microprojectile 
bombardment. 

Intron. The DNA sequence within 
a gene that interrupts the protein- 
coding sequence of a gene. It is 
transcribed into RNA but it is 
removed before the RNA is 
translated into protein. 

Marker gene. A gene, such as 
one that encodes antibiotic 
resistance, that allows genetically 
modified cells to be readily 
selected. 



mRIMA splicing. In eukaryotic 
cells, the process of excising 
introns from a primary RNA 
transcript and joining together 
exons to form a final mRNA 
molecule. 

Plasmid. A small, circular, self- 
replicating, extrachromosomal 
piece of DNA. Many artificially 
constructed plasmids are used as 
cloning vectors. 

Recombinant DNA. DNA that 
contains information from two or 
more different species of 
organisms. 

Restriction enzymes. Enzymes 
that cut DNA at specific 
sequences; also known as 
"restriction endonucleases." 

Reverse transcriptase. An 

enzyme derived from a retrovirus, 
which uses single-stranded RNA as 
a template for the production of 
double-stranded DNA. 

Southern blot. A technique for 
transferring DNA fragments 
separated by electrophoresis to a 
filter paper sheet. The fragments 
are then probed with a labeled, 
complementary nucleic acid to 
help determine their positions. 

Totipotent. Cells that can 
replicate to form any part of a 
complete organism. 

Transgenic organism. An 

organism that contains hereditary 
information from two different 
species of organisms. 
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